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Robust  and  Adaptive  Guidance  and  Control  Laws  for 
Missile  Systems 


1  Objective  of  the  Research  Effort 

The  objective  of  this  three  year  study  is  to  develop  robust  and  adaptive  guidance  and  control 
laws  for  homing  missiles,  mechanizable  with  near-future  computer  technology,  which  can 
satisfy  system  objectives  in  the  presence  of  large  uncertainties  and  nonlinearitics.  Over  the 
past  years,  considerable  progress  has  been  made  in  resolving  some  of  the  fundamental  issues 
in  homing  guidance.  Of  particular  importance,  new  filter  structures,  which  were  tailored  to 
the  passive  homing  engagement;  and  new  target  models  and  kinematic  psuedo-measurements, 
which  modified  the  new  filter  algorithm  and  induced  a  new  adaptive  homing  guidance  law, 
were  developed.  During  the  last  three  years  in  support  of  these  important  innovations,  robust 
filters  and  control  schemes  which  further  enhance  system  performance  were  developed  based 
upon  a  stochastic  control  problem  known  as  the  linear-exponential-Gaussian  problem  and  a 
related  deterministic  approach  called  the  disturbance  attenuation  problem.  Most  important, 
emerging  from  this  work  is  a  new  structure  for  adaptive  control  and  a  unifying  framework 
for  developing  midcourse  and  terminal  homing  missile  guidance  schemes  under  uncertainty. 
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2  Status  of  the  Research  Effort:  Robust  and  Adaptive  Guidance 
and  Control  Laws  for  Missile  Systems 

An  active  research  program  in  systems  theory  has  been  established  at  UCLA  in  the  Mechan¬ 
ical,  Aerospace  and  Nuclear  Engineering  Department.  One  aspect  of  this  research  program 
centers  on  the  development  of  robust  high  performance  guidance  and  control  schemes  for 
such  new  concepts  as  bank-to-turn  missile  guidance  systems.  This  activity  considers  new 
theoretical  innovations  and  tailors  them  to  missile  system  applications.  Under  this  grant,  the 
game  theoretic  approach  to  control  and  guidance  syntheses,  estimation  with  bearings-only 
measurements,  and  adaptive  control  were  investigated.  The  missile  system  is  an  important 
conduit  for  motivating  and  using  the  results  of  this  effort.  Our  approach  is  to  develop  re¬ 
alistic  guidance  systems  based  upon  models  of  special  dynamic  and  measurement  systems. 
Previous  results  have  been  new  state  reconstruction  algorithms  based  upon  bearing-only 
measurements  [1,2]  and  new  missile  guidance  rules  [3].  The  missile  guidance  system  pre¬ 
sented  in  [3]  represents  the  integration  of  our  efforts  over  the  years  in  estimation,  target 
modeling  and  missile  guidance  laws.  In  this  grant,  robust  filters  and  control  schemes  which 
further  enhances  system  performance  were  developed  based  upon  a  stochastic  control  prob¬ 
lem  known  as  the  linear-exponential-Gaussian  problem  and  a  related  determistic  approach 
called  the  disturbance  attenuation  problem.  Emerging  from  this  work  is  a  new  structure  for 
adaptive  control  and  a  unifying  framework  for  developing  missile  guidance  schemes  during 
midcourse  and  terminal  homing  under  uncertainty. 

2.1  Game  Theoretic  Synthesis 

Since  the  game  theoretic  approach  [4]  is  formulated  in  state  space,  it  generalizes  the  cur¬ 
rently  popular  H-infinity  robustness  techniques  for  time-invariant  systems  to  finite-time  and 
time-varying  systems  with  both  partial  and  full  information.  This  is  done  by  first  defin¬ 
ing  a  disturbance  attenuation  function  as  the  ratio  of  an  L2  function  of  the  outputs  over 
an  L2  function  of  the  disturbance  inputs.  A  controller  is  to  be  determined  which  bounds 
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the  disturbance  attenuation  function  in  the  presence  of  all  input  disturbances  constrained 
to  be  in  L2.  Furthermore,  the  game  theoretic  approach  is  related  to  the  stochastic  control 
problem  of  minimizing  the  expected  value  of  an  exponential  of  a  quadratic  argument  sub¬ 
ject  to  a  Gauss-Markov  process  [5j.  An  essential  feature  of  the  solution  to  this  so-called 
Linear-ExponentiaJ-Gaussian  (LEG)  problem  is  the  realization  that  the  expectation  opera¬ 
tor  of  an  exponential  form  with  respect  to  Gaussian  random  variables  is  equivalent  to  the 
extremization  of  the  augmented  quadratic  argument  formed  from  the  exponential  cost  and 
the  Gaussian  probability  density  function.  This  leads  to  a  game  theoretic  interpretation 
of  this  stochastic  control  problem.  An  application  of  the  game  theoretic  approach  to  the 
problem  of  integration  of  the  missile  autopilot  with  the  guidance  system  is  given  in  f6j. 

2.2  Robust  Game  Theoretic  Synthesis  in  the  Presence  of  Uncer¬ 
tain  Initial  States  and  System  Parameters 

A  game  theoretic  approach  to  linear  control  synthesis  [4]  is  developed  where  initial  states  and 
parameter  uncertainties  are  included  as  adversaries  [7].  An  implicit  approach  [8]  closely  re¬ 
lated  to  //-synthesis,  characterizes  the  parameter  dependence  in  the  system  coefficient  matri¬ 
ces  as  linear.  Although  process  and  measurement  uncertainties  are  included  in  the  game  cost 
criterion  by  quadratic  penalty  functions,  the  initial  states  and  parameters  are  constrained 
to  lie  on  or  within  given  multi-dimensional  ellipsoids.  It  is  shown  that  the  suboptimal  Hoo 
controller  is  not  a  saddle  point  strategy  with  this  cost  criterion  even  if  the  parameters  are 
known.  To  solve  this  new  dynamic  game  problem,  a  general  linear  control  structure  is  as¬ 
sumed  of  given  dimension  for  the  output  or  partial  information  control  prob  -ern  so  that  the 
control  is  a  function  of  a  set  of  constant  control  parameters.  In  this  game  the  adversaries  are 
assumed  to  be  knowledgeable  about  the  value  and  the  strategy  of  the  control,  although  the 
controller  is  only  aware  of  the  measurement  history.  It  is  shown  that  under  this  circumstance, 
the  dynamic  game  can  be  transformed  into  a  parameter  game  problem  between  the  control 
parameter  set  and  plant  unknown  parameters  if  there  exists  an  admissible  control  parameter 
set.  If  a  saddle  point  strategy  for  the  parameter  game  problem  exists,  then  the  saddle  point 
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inequalities  for  the  dynamic  game  problem  are  satisfied  for  process  and  measurement  distur¬ 
bances  as  well  as  for  all  initial  states  and  parameters  within  their  respective  ellipsoids.  This 
saddle  point  inequality  guarantees  a  level  of  performance  robustness  as  given  by  the  value 
of  the  cost  at  the  saddle  point.  In  relation  with  the  dynamic  game  problem,  a  disturbance 
attenuation  problem  is  introduced  where  two  types  of  disturbance  attenuation  parameters 
are  used;  one  is  associated  with  process  and  measurement  disturbances  and  the  other  is 
associated  with  initial  states.  It  is  shown  that  these  disturbance  attenuation  parameters  are 
closely  related  to  Hoo  norm  and  the  cost  criterion  of  the  dynamic  game. 

2.3  A  LEG  Estimator 

By  probing  further  into  the  relationship  between  game  theory  and  the  stochastic  LEG  prob¬ 
lem,  very  significant  results  have  been  obtained.  By  employing  a  worst-case  performance 
measure,  a  game  theoretic  approach  is  used  to  determine  a  discrete-time  state  estimator 
[9,10].  The  continuous  estimator  is  reported  in  [llj.  Although  the  order  of  minimization 
and  maximization  do  not  effect  the  saddle  point  value  for  this  class  of  games,  the  order  is 
critical  in  obtaining  game  theoretic  strategies.  Two  interesting  strategies  result,  an  I2  esti¬ 
mator  which  is  the  deterministic  equivalent  to  the  Kalman  filter  and  the  estimator  which 
satisfies  a  bound  on  a  disturbance  attenuation  function.  If  a  corresponding  LEG  problem  is 
constructed  where  the  quadratic  argument  of  the  exponential  is  the  estimation  error,  then 
these  two  estimators  under  somewhat  different  assumptions  still  result  and  give  the  same 
saddle  value  of  the  augmented  quadratic  argument  formed  from  the  exponential.  Note  that 
it  is  well  known  (Sherman’s  Theorem)  that  minimizing,  with  respect  to  a  function  of  the 
measurement  history,  the  expectation  of  any  symmetric,  unimodal  function  of  the  estimation 
error  (such  as  the  exponential)  subject  to  a  Gauss-Markov  system  results  in  a  conditional 
mean  estimator,  i.e.,  the  Kalman  filter.  However,  if  the  estimation  error  is  replaced  by  the 
sum  of  estimation  errors  where  these  errors  are  functions  of  the  measurement  history  up 
to  the  index  of  time  in  the  sum,  then  Sherman’s  theorem  does  not  hold,  and  the  Kalman 
estimator  is  not  minimizing.  However,  the  stochastic  LEG  problem  does  have  a  unique  solu- 
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tion  which  is  the  Hoo  estimator.  This  new  filter  generalizes  the  Kalman  filter.  Its  statistical 
properties  are  now  being  investigated. 

2.4  The  Centralized  and  Decentralized  LEG  Problem 

A  third  activity  to  be  reported  is  that  a  uniform  approach  to  the  solution  of  the  LEG  prob¬ 
lem  constrained  to  the  classical  information  pattern,  one-step  delayed  information  pattern, 
and  one-step  delay  information-sharing  pattern  has  been  obtained  [12|.  The  results  vastly 
simplify  the  results  of  [13].  The  one-step  delayed  information-sharing  pattern  allows  a  de¬ 
centralized  control  structure  where  at  each  state  of  the  dynamic  programming  algorithm  a 
stochastic  static  team  problem  is  solved.  Furthermore,  both  convex  and  unimodal  exponen¬ 
tial  functions  are  included  which  is  an  important  extension  of  the  work  in  [14],  Application  of 
these  results  to  sensor  fusion  in  missile  guidance  systems  should  produce  robust  performance 
in  the  presence  of  electronic  counter  measures. 

2.5  A  Game  Theoretic  Dual  Control  Problem 

Beginning  with  a  disturbance  attenuation  function,  a  game  is  formulated  for  a  special  class 
of  dual  control  problems.  A  scalar  bilinear  system  is  considered  where  the  control  coefficient 
is  unknown  and  only  an  uncertain  measurement  of  the  state  variable  is  available.  The 
resulting  controller,  which  is  constrained  to  the  measurement  history,  is  a  function  of  the 
state  and  parameter  estimates  and  their  associated  pseudo-error  variance.  This  controller 
depends  upon  the  real  roots  of  a  fifth-order  polynomial  whose  coefficients  are  also  functions 
of  both  the  estimates  and  pseudo-error  variance.  When  the  paper  [15]  was  written,  it  was 
though  that  the  controller  had  a  dual  control  property  because  of  the  explicit  appearance 
of  the  pseudo- variances.  However,  this  is  not  the  case  for  this  formulation.  Nevertheless, 
the  resulting  controller  based  only  on  the  initial  formulation  as  a  disturbance  attenuation 
problem  did  lead,  without  any  approximations,  to  an  interesting  and  mechanizable  controller. 
This  motivated  new  work  in  adaptive  control  discussed  in  the  next  subsection. 
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2.6  A  Disturbance  Attenuation  Approach  to  Adaptive  Control 

In  Rhee  and  Speyer  [4],  the  disturbance  attenuation  problem  is  shown  to  extend  the  results 
of  Hoo  analysis  from  time-invariant  systems  on  infinite  intervals  to  time-varying  systems  on 
finite  intervals.  By  using  a  disturbance  attenuation  approach,  it  is  attempted  to  limit  the 
effect  of  any  possible  combination  of  disturbance  and  uncertainty  to  some  small  multiple 
of  the  disturbance.  To  this  end,  a  disturbance  attenuation  function  is  constructed.  The 
disturbance  attenuation  function  is  then  converted  to  a  performance  index  similar  to  those 
in  more  common  optimal  control  problems.  The  problem  then  becomes  a  zero-sum  game, 
in  which  the  initial  uncertainties  and  the  system  disturbances  are  considered  as  intelligent 
adversaries  attempting  to  maximize  the  performance  index,  with  the  control  playing  their 
opponent,  trying  to  minimize. 

This  approach  has  in  the  past  resulted  in  very  complex  results  when  applied  to  other  than 
a  few  special  cases  [15].  Speyer,  Fruchter,  and  Hahn  [15]  applied  it  to  a  scalar  system,  and 
for  a  constant  unknown  control  parameter,  obtained  a  closed-form  solution  that  hinged  upon 
the  solution  of  a  fifth-order  polynomial.  In  [16],  an  alternate  approach  which  generalizes  the 
results  of  Bernhard  [17]  appears  to  pioduce  some  simplification.  By  using  the  Principle  of 
Optimality,  the  problem  is  split  into  two  parts,  each  to  be  solved  separately,  and  rejoined 
by  an  algebraic  “connection”  condition.  The  resulting  problems  axe  much  simpler  than 
the  original,  although  in  general  still  quite  complex.  In  many  cases,  however,  they  can  be 
reduced  to  a  manageable  level.  In  particular,  for  the  case  of  constant  parameters,  w'ith  the 
system  uncertainty  limited  to  unknown  control  parameters,  a  solution  suitable  for  real-time 
application  is  derived  and  the  cxistance  of  a  saddle  point  is  shown. 

The  controller  resulting  from  this  approach  is  a  function  both  of  the  state  and  parameter 
estimates  and  the  associated  curvature  matrices  which  act  as  pseudo-variances.  Although 
this  approach  represents  a  type  of  separation,  no  explicit  assumption  is  made  of  certainty 
equivalence,  as  found  in  the  development  of  current  adaptive  control  algorithms. 
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2.7  A  System  Characterization  of  Positive  Real  Conditions 

Necessary  and  sufficient  conditions  for  positive  realness  in  terms  of  state  space  matrices  are 
presented  in  [18]  under  the  assumi)iion  of  complete  controllability  and  complete  observability 
of  square  systems  with  ind-vendent  inputs.  As  an  alternative  to  the  positive  real  lemma 
and  to  the  s-domain  inequalities,  these  conditions  provide  a  recursive  algorithm  for  testing 
positive  realness  that  result  in  a  set  of  simple  algebraic  conditions.  By  relating  the  positive 
real  property  to  an  associated  variational  problem,  the  paper  outlines  a  unified  derivation  of 
necessary  and  sufficient  conditions  for  optimality  of  both  singuleir  and  nonsingular  problems. 
Based  on  this  algorithm,  a  synthesis  of  a  positive  real  system  via  output  feedback  is  presented. 
This  work  is  motivated  by  the  need  to  determine  cost  criteria  which  allow  control  design  based 
upon  phase  considerations  rather  than  magnitude,  the  result  of  minimizing  L2  induces  norms. 
Current  work  is  in  determining  the  characteristics  of  plants  and  compensators  for  which 
closed-loop  transition  matrix  between  desired  output  amd  disturbance  inputs  are  positive 
real  or  dissipative. 
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A  Game  Theoretic  Approach  to  a  Finite-Time 
Disturbance  Attenuation  Problem 

Ihnseok  Rhee  and  Jason  L.  Speyer,  Fellow,  IEEE 


Abstroet—A.  distarbance  attennatioB  froMcm  over  a  finite- 
time  interval  is  considered  by  a  fame  tbcoretic  approach  wbm 
the  control,  restricted  to  a  functioB  of  the  BMasorement  history, 
plays  against  adversaries  composed  of  the  process  and  measnre- 
nrait  distorbances,  and  the  initial  state.  A  xero-snm  game, 
formnlated  as  a  quadratic  cost  criterion  subject  to  linear  time- 
varying  dynamics  and  measurements,  te  solved  by  a  calculus  of 
variation  technique.  By  first  maximising  the  quadratic  cost  crite¬ 
rion  with  respect  to  the  process  disturbance  and  initial  state,  a 
fnfi  information  game  between  the  control  and  measurement 
residual  subject  to  the  estimator  dynamics  resnlts.  The  resulting 
solution  produces  an  n-dimensioBal  compensator  which  com¬ 
pactly  expresses  the  controller  as  a  linear  combination  of  the 
measurement  Ustory.  Furthermore,  the  controller  requires  the 
solution  to  two  Riccati  differential  eqnatioBs  (RDE).  For  the 
linear  saddle  strategy  of  the  controller  necessary  and  sufficient 
conditioBS  for  the  saddle  point  to  he  strictly  concave  with 
respect  to  all  distorbances  and  initial  conditions,  and  sufficient 
conditioBS  for  various  process  disturbance  strategies  to  satisfy 
the  saddle  point  condition  are  given.  A  distnrbance  attenuation 
problem  is  solved  based  on  the  resnlts  of  the  game  problem.  For 
time-invariant  systems  it  is  shown  that  under  certain  conditions 
the  time-varying  controller  becomes  time-invariant  on  the  infi¬ 
nite-time  interval.  The  resulting  controller  satisfies  an  norm 
bound. 

I.  iNTRODUenON 

Recently,  a  tune-tiomain  control  synthesis  procedure 
for  control  problems  has  been  developed  in  [1],  [3]. 
Riccati  equations  arising  in  the  linear  quadratic  (LQ)  game 
problem  [12]-[14],  [23]  and  Linear-Exponential-Gaussian 
(LEG)  problem  [1S]-[18]  play  a  key  role  in  this  synthesis 
procedure.  These  results  motivated  the  formulation  of  a 
finite-time  interval  control  problem  for  time-varying 
systems  based  on  a  linear  quadratic  game  approach.  Refer¬ 
ences  [4],  [S]  considered  a  finite-time  control  problem 
where  ,  the  initial  condition  is  given  as  zero.  The  full  state 
information  discrete-time  /f.,  control  problem  is  considered 
[6],  [7]  by  using  existing  linear  quadratic  game  results. 

In  this  piq>er,  a  finhe-time  intervd  disturbance  attenuation 
problem  for  a  time-varying  system  with  uncertainty  in  the 
initial  conditions  of  state  is  considered  based  on  a  LQ  game 
theoretic  formulation  where  the  control  plays  against  adver- 
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saries  cooqposed  of  the  process  and  measurement  distur¬ 
bances  and  initial  conditions.  The  general  problem  presented 
here  is  formulated  as  one  of  partial  information  such  that  the 
control  is  restricted  to  be  a  function  of  only  die  measurement 
history.  A  standard  calculus  variation  procedure,  w  .vas 
shown  in  [13],  [24]  to  be  a  useful  tool  for  the  full  inio.'^:  ^  ^on 
LQ  game  problem,  is  adopted  to  solve  the  LQ  game  pr>  *i>lem 
widi  partial  information.  The  stdution  to  this  game  problem 
geneiidizes  many  of  the  standard  results  given  in  [13],  [14], 
[24].  The  resulting  compensator  for  the  controller  is  n-di- 
mensional  requiring  the  solution  to  two  Riccati  differential 
equations  (RDE).  In  particular,  die  solution  is  identical  to 
that  given  by  die  continuous-time  formulation  of  the  Linear- 
Exponential-Gaussian  (LEG)  proUnn  [16].  The  formulation 
and  solution  of  this  finite-time  time-varying  game  problem  is 
given  in  Section  m.  By  first  maximizing  die  quadratic  cost 
criterion  with  respect  to  the  process  disturbance  and  initial 
state,  a  full  information  game  results  between  the  control  and 
measurement  residual  subject  to  die  estimator  dynamics. 
Three  different  saddle  strategies  for  die  process  disturbance 
are  considered  and  sufficient  conditions  for  the  existence  of  a 
saddle  point  solution  are  determined.  In  addition,  necessary 
and  sufficient  conditions  are  determined  for  the  saddle  point 
to  be  strkdy  concave  with  respect  to  all  nonzero  variations  of 
the  disturbances  and  initial  oraditions  from  their  saddle  point 
strategies.  Based  on  these  results,  conditions  for  the  finite-time 
disturbance  attenuation  problem  of  Section  IV  ate  developed. 

The  finite-time  solution  is  specialized  in  Section  V  to  the 
infinite-time  time-invariant  solution  based  upon  the  two 
RDE’s  produced  in  Section  m.  In  particular,  it  is  shown  in 
Section  m-C  that  assuming  the  existence  of  a  nonnegative 
definite  solution  to  the  algebraic  Riccati  equation  (ARE),  the 
solution  to  the  RDE  with  certain  initial  conditions  converges 
to  the  minimal  nonnegative  definite  symmetric  solution  of  the 
ARE.  The  results  of  Section  m-C  provide  a  proper  develop¬ 
ment  for  the  infinite-time  time-invariant  controller.  In  Section 
V  it  is  shown  that  tiie  n-dimensional  compensators  con¬ 
structed  from  all  the  nonnegative  siffiitions  to  two  ARE, 
satisfying  certain  condition,  satisfy  the  norm  bound. 

Throu^iout  this  piqier,  H  *  ||  .4  denotes  tiie  Euclidian  norm 
weighted  by  A;  means  dJt /dx;  D  >  (K^  0)  means 
dut  D  is  a  positive  (nonnegative)  definite  matrix;  D  >  £(^ 
E)  means  that  D  -  E>  0(^  0). 

During  the  course  of  the  review  process  tiie  authors  be¬ 
come  aware  of  similar  results  for  the  finhe-time  disturbance 
attenuation  problem  that  were  independently  obtained  in 
[81.  [9]. 
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n.  Problem  Statement 

Consider  a  linear  time-varying  system  described  by 

*(0  =  ^(0-^(0  +  Bit)u{t)  +  T{t)w{t)  (1) 

z(0=/^(0*(0+r,(/)v(0  (2) 

+  (3) 

where  x  is  tiie  r  x  1  state  vector;  u  is  the  m  x  1  input 
vector;  z  is  the  x  1  measurement  vector;  w  and  v  are  the 
9X1  and  p  X  \  input  disturbance  vectors,  respectively;  y 
is  tile  controlled  ouQxit  vector;  and  it  is  assumed  that  F)  and 
R  =  C,^C,  are  nonsingular  and  the  initial  condition  x(0)  is 
unknown.  AU  matrices  have  appropriate  dimensions  and  ate 
time-varying. 

Define  die  measurement  history  up  to  /  as 
Z,=  {z(s),0sjs/}. 

The  admissible  control  is  restricted  to  be  a  fiinction  of  only 
Z,.  ^  denote  the  set  of  admissible  controls,  and  ^,C  ^ 
is  the  subset  of  linear  functions  of  Z,. 

The  disturbance  attenuation  problem  to  be  considered  is  to 
find  a  control  ue  such  that 

UiTnir+  r^yVdt 

Jq 

<  -^[ll*(0)llp„-  +  +  llulli^O  dt  (4) 

for  all  w,  weJLjlO,  7],  x(0)eR"  such  that  (w(t),v(t), 
x(0))^0  over  all  /6[0,  7].  It  is  assumed  that  0  is  a 
negative  constant,  the  final  time  is  fixed,  Pg,  IV,  and  V  ate 
time-varying  positive  definite  matrices,  and  Qj-  is  a  nonnega- 
tive  definite  matrix. 

In  order  to  solve  the  above  problem,  we  will  first  consider 
as  in  [6],  [7]  the  related  lineiu  quadratic  game  problem  of 
finding  u*e  v*,  w*  eLjlO,  7]  and  x*(0)  eP"  satisfying 
the  saddle  point  condition 

J{u*,  v.  w,  x(q))  S  J(u*,  V*,  H>*,  x*(0)) 

s  J{u,  u*,  w*,  x*(0))  (5) 

for  all  u,  V,  iveL2[0,  7],  x(0)  eP",  where 

y(i/,u.w,x(0))  =  i  ^llx(o)  -.?oll5.,-  + 

+ ll«lli  +  ^(IIHIV--  +  IIHIi-O}  *] 

(6) 

ndiere  is  a  given  vector  and  Q  -  C^C.  The  left-hand 
side  inequality  plays  an  important  role  in  the  above  distur¬ 
bance  rejection  prc^lem. 

m.  The  Linear  Quadratic  Game  Problem 

A  saddle  point  strategy  can  be  obtained  by  solving  two 
optimization  problems 

min  max  max  max7(u,  V,  w,  x(0))  s /*  (7) 

Ilf*  »  w  jtun 


max  max  max  min7(u,  V,  w,  x(0))  = /*.  (8) 

v  w  xCO)  near 

The  sctiutions  to  (7)  and  (8)  produce  saddle  point  strategies 
when  J*  =  y*. 

First,  the  t^itimization  problem  (7)  is  considered.  By  sub¬ 
stitution  of  the  constraint  (2)  into  (6),  we  can  change  the 
optimization  problem  of  (7)  to  the  following  problem: 

min  max  max  mw  ^  T II  JfW  -  -^oll r»,-'  +  II II Or 

hcB'  X  w  xifi)  2  V 

+  ^(l|H'lli^-+ll(2-^Af)||i.-.))rfr  (9) 

subject  to  equation  (1)  where  V  -  r,Kr,r  This  cost  crite¬ 
rion  for  the  continuous  deterministic  game  is  retneniscent  of 
the  criterion  constructed  from  the  argument  of  the  exponen¬ 
tial  in  the  discrete  LEG  problem  [17],  [18]. 

A.  Maximization  with  Respect  to  w  and  xfOJ 

To  solve  the  problem  consider  first  the  maximization  of  J 
with  respect  to  w  and  x(0)  for  a  given  z  and  fixed  strategy 
ue  ^  for  which  the  variations  of  u  and  z  vanish.  The 
resulting  cost  criterion  will  then  be  minimized  and  maxi¬ 
mized  with  respect  to  u  and  z,  respectively.  Let 

y,  =  max  max  J. 
w  *(0) 

The  standard  variational  procedure  [24]  is  formally  iqiplied  to 
tiiis  problem.  A  vector  Lagrange  mdtiplier  function  X  is 
introduced  to  adjoin  (1)  to  (9).  The  first  variation  of  J  for 
given  u  and  z  is  given  by 

7  [  ^0  '{ *(0)  -  Sco}  +  m]  6x{0) 

f  1 

+  [Gr^(3')-^M7')J  Mr) 

+  +  6w  dt 

where  is  the  Hamiltonian,  defined  by 

jV{x,w,  X)  =  ^^x^Qx  M^Pu  +  w'’"{$W)~^w 
+  {z- Hxf{evy\z- Hx)] 
■»-ix^(Xx-i-PM-i-rw). 

The  first-order  necessary  conditions  for  a  maximum  are 
X=-ffJf/,  X(7)  =  0Qrx(7), 

X(0)  =  -Pg-'{x(0)-.eg}  (10) 

=  0  ••  w  =  -  H'r'X.  (11) 

‘  After  substituting  (11)  into  (1)  tiie  following  two  point 
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boundaiy  value  problem  is  obtained: 
xl  f  •  A  -rfFr^-lix 


with 


to  (20)  and  using  (21),  we  obtain  die  second  variation  b^J  as 
a  perfect  square  of  the  form 


iw-  WT^P-'  bxl\^-,  dt 


x(0)  =  Jfo  -  PoMO).  HT)  =  eQrx(T).  (13) 

Since  die  two  point  boundary  value  proUem  is  linear,  the 
solution  can  be  obtained  by  the  sweep  mediod  [24].  Let  x* 
and  X*  denote  the  solutions  to  (12)  and  (13).  We  assume  that 
the  solution  x«  satisfies 

X*  =  X  -  /»X*  (14) 

where  Jc  and  P  are  to  be  determined.  The  substitution  of  die 
differentiation  of  (14)  into  (12)  yields 

(p  -  PA^ -AP  +  p(H^P-'H+eQ)p-  rirr^lx* 

=  i  -  v4x  -  J?«  -  PH^V-'(z  -  «c)  +  ePQx. 

Therefore,  if  we  choose  x  and  P  to  satisfy  the  following: 

jc  =  Ax  +  Bu  +  PH^V-'(z  -  Hk)  -  0PQx, 

kiO)=ko  (15) 

P  =  PA^  +  AP-  P{H^V~'N  +  0Q)P  + 

P(0)=/>„  (16) 

then  (14)  becomes  an  identify.  From  (13)  and  (14)  we  obtain 

Mn  =  «Gr{/+  «P(nGr}''i(P)  (17) 

x*(r)  =  {r+0P{T)Qry'k{T)  (18) 

vidieie  we  assume  {  /  +  0P(.T)Qj-}  is  nonsingidar.  Then,  we 
can  calculate  X*(f)  from 

-  [a  -  P(H^p-'H-t-  0Q)]^X* 

+  H^V-^{z-Hx)-0Qx  (19) 

with  the  final  condition  given  Ify  (17). 

The  second  variation  is  considered  to  deternune  addititmal 
necessary  conditions  of  optimality.  The  second  variation 
along  die  extremal  path  b^J  is  given  as 

6V=  «x(0)^(di»o)"'  5x(0)  +  bx{TfQrbxiT) 


where  we  assume  P(t)  is  nonsingular  over  f  6  [0,  T].  Since 
the  performance  index  is  quadratic,  for  a  maximum  the 
second  variation  should  be  negative  for  all  variations  jx(0) 
and  bw  not  vanishing  simultaneously.  Consider  the  variation 

bw=  WT^p-'bx.  (22) 

Then,  from  (21) 

bx=iA-  rwr^p-')  bx,  (23) 

Hence 

bx{T)  =  *(T.0)bx(0) 

where  t  denotes  the  transition  matrix  of  the  linear  differen¬ 
tial  equation  (23).  Thus 

«V=  ■i«x(o)*’*^(r,o){/»-'(r)  +  dGr}*(7'.o)Mo) 

<  0,  V«x(0)  ^  0 

which  implies  that  for  a  maximum  it  is  necessary  that 

p-‘(r)  +  0Qr>O  (24) 

since  <^(7,0)  is  nonsingular.  Also,  (24)  implies  P(7)  >  0 
since  2  0. 

We  will  show  that  if  the  condition  (24)  holds  and  P~ '  is 
finite,  the  second  variation  is  negative  for  all  variation  jx(0) 
and  jw  not  vanishing  simultaneously.  If  (24)  is  satisfied, 
then  for  all  variations 

fiVsO 

where  the  equality  holds  for  the  variation  given  in  (23)  and 

bx(r)  =  0.  (25) 

However,  for  the  variation  (22)  and  (25),  (23)  leads  to 
bx(0  =  0  and  bw(t)  =  0  over  fe[0,  T]  if  P~‘  is  finite. 
That  is,  w*(r)  =  -  IFr’V(f)  and  x*(0)  =  Jcq  -  PoX*(0) 
produce  the  maximum. 

B.  The  Solution  to  Problem  (7) 


\  tT  _ 

+  - /  [bx\H'^V-^H-\-0Q)bx-\-bw^W-^  bvf]dt. 

9  Jq 

(20) 

From  the  omstraint  (1),  we  obtain 

8x  =  >l«x-«-r«w.  (21) 

Adding  the  zero  quantify 


To  perform  the  minimization  and  maximization  with  re¬ 
spect  to  u  and  z,  first  evaluate  J,.  By  using  (10),  (11),  and 
(14),  y,  can  be  rqiresented  as 

•7.  =  ^[IIMo)ll5.,+  IIMnil?.c.,l 


=  lbx^0P)-^  bx]l-  JJ^{bx^i0P)-'  bx)  dt  ||z-//(Jt-PX*)lli^'}  *• 


0 


lOM 
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Adding  the  zero  quantity 

to  y,  yields  (using  (16)  and  (19)) 

y,  -  ^x^{TYQrx^{T)  +  ^x*(r)^/>(r)x*(r) 

+  ^  +  u^Ru 

2  Jq  ' 

+(z~mf(eP)'\z-Hx)]di. 

From  (17)  and  (18) 

x*(TfQrX*iT)  +  |x*(r)V(r)X*(7-) 

^xiTfSrHT)  (26) 

where 

Sr=QT{l  +  BP(T)Qry'. 

By  the  matrix  inversion  lemma  Sj-  becomes 

St=Qt-9Qt{P-'{T)  +  eQr}~'Qr. 

Therefore,  condition  (24)  inqrlies  that  Sj-  2  0.  Now,  by 
using  (26)  y,  can  be  represent^  in  terms  of  Jr  as 

y,  =  ^£(TfSrJc(T)  +  i 

•  J^li^Qx+u^Ru+  {z-mY{ev)~‘{z-m)]dt. 


The  minimization  and  maximization  can  be  performed  with 


min  max  —  1  i(7')^57-i(r) 

U  V  2  [ 

+  y) '  ’c}  dt  (27) 

•'0 

subject  to 

i^Ai  +  Bu  +  fv,  Jf(0)  =  Jco  (28) 

where  A=A-  BPQ  and  f  ^  PH^p-K  We  obtain  (28) 
iiom  (IS)  by  using  the  definition  of  v.  Finally,  the  problem 
reduces  to  die  well-known  deterministic  game  problem.  The 
optimal  feedback  strategies  u*e  ^  and  v*  for  (27)  are  given 
in  [14],  [23]  as 

u*  «  -R-^B^Sx  (29) 

V*  =  -6Kr^S»  =  -BHPSSt  (30) 

if  the  Riccati  equation 

~S  SAa-  A^S  -  S{BR-^B^  BfPf'^)S  Q 
»  S(>1  -  BPQ)  +  (y4  -  BPQfs 

-  S{BR-^B^ BPH^P-^HP)S  +  Q  (31) 


with  the  terminal  condition  S{T)  =  has  a  solution  over 
r€[0,  T\.  It  is  noted  that  open-loop  strategies  for  u  may 
make  the  cost  (27)  unbounded  without  certain  condition  while 
the  feedback  strategies  (29)  makes  die  cost  bounded  (see 
[23]).  The  cost  criterion  using  the  u*  and  v*  is  given  as 

^.(«*.i5*)  =  ^i?5(0)Xo-  (32) 

Let  x*(t),  X*(/),  and  £*(()  denote  the  optimal  trajectory 
of  X,  K  and  i,  respectively.  x*(f),  X*(/),  and  x*(t)  are 
solutions  of  X*,  X*,  and  Ji  with  u  =  u*  and  v  =  v*. 
Substituting  (29)  and  (30)  into  (IS)  and  (19)  yields 

i*  =  iA-  BPQ)x*  -  BR-'B^Sx* 

-BPH^p-'HPSi*,  i*{0)=io  (33) 

=  -(yl  -  BPQf)f  +  H^p-'HP{lf  -  BSx*) 

-BQx*  (34) 

with  )f(r)  =  BSj-St*(T).  Jc*(0  can  be  calculated  from  (33) 
independendy.  O^rve  that 

-^(dSS*)  ^  -  (A-  BPQ)^(BSx*)  -  BQx*.  (35) 
dt 

Comparing  (34)  and  (3S)  gives 

X*(/)  =  dS»*(r),  t€[0,r].  (36) 

Substituting  (29)  and  (36)  into  (12)  and  (14)  yields 
x*^Ax*-  (BR'^B^s  +  erirr^s)jf*, 
jr*(0)  =  {/-(?FoS(0)}^o- 

Observe  diat 

^[(/- dPS)Jt*]  =A{I-BPS)St* 

-{BR-'B^S  +  Br}vr^s)x*. 

By  inspection  we  obtain  the  (^itimal  state  trajectory  as 

x*(0  =  (/-ePS)Jf*(0.  /6[0,r].  (37) 

From  (2),  (30),  (37)  and  the  definition  of  C,  die  c^timal 
trajectory  for  v  and  x(0),  denoted  as  v*(t)  and  x*(,0), 
respectively,  are  given  as 

u*(/)  =  rr'{o*(0  +  "(^*(0  -  jf*(0)}.  [0.  T] 

=  0,  (38) 

x*(0)  =  {/-tf/>oS(0)}^o-  (39) 

From  (II),  (36),  and  (37),  we  obtain  the  qititnal  trajectory 
for  w,  denoted  as  w*(f),  in  terms  of  St*(t)  as 

w*(/)  =  -®IFr'’SJt*(0.  fe(0,r]  (40) 

or  in  terms  of  x*(/),  by  assuming  I  ~  BPS  is  nonsingular' 
over  (0,  71,  as 

w*(t)  =  -BWr^nx*(t).  /elO,?-]  (41) 


Conditwos  thM  entDft  Ihis  mvene  m  given  in  the  next  section. 
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wbere 

n  =  S(I-9PSy\  (42) 

Fqiwtinns  (38).  (39),  (40).  Rod  (41)  denote  the  open-loop 
strategies  for  v,  x(0)  and  w.  The  open-loop  strategy  for  » is 
obtained  by  substituting  Je*  for  Jr  in  (29).  However,  since 
the  cost  (27)  may  become  unbounded  for  the  open-loop 
strategy  for  u.  the  closed-loq>  strategy  confined  to  be  a 
fiinctinn  of  the  measurement  history  (29)  should  be  used. 
Siiffipit>fit  conditions  for  die  existence  of  various  saddle  point 
strategies  are  given  in  Section  III-D.  The  cost  along  the  u* , 
v*.  w*.  and  jr*(0)  is  obtained  from  (32)  as 

J*  =  y(u*,  u*.  w*.  Jr*(0))  =  •  (43) 

C.  Some  Properties  of  Riccati  Equations 

Consider  a  Riccati  difierential  equation  of  the  form 

~x  =  sa xj^ - x{aa^ -  99^)x+  s, 

X{T)^Xr^0,  t^T  (44) 

where  the  coefficient  matrices  st .  9.  9  and  9  are  time- 
varying  matrices.  It  is  assumed  that  9  and  X^  are  nonnega¬ 
tive  definite  matrices.  Note  that  die  RDE  (16)  has  this  form  if 
we  change  indqiendent  variable  from  t  to  t  —  T  —  t. 

Let  X{t^  Xj-)  denote  die  solution  to  the  RDE  (44),  if  it 
exists. 

Lemma  1:  Suppose  that  X(.t,  Xt)  exists  over  (/o.  T].  If 
Xj.  a:  0(>  0),  then  X(t,  Xt)  is  nonnegative  (positive)  def¬ 
inite  over  [to,  r]. 

Proof:  Suppose  that  X(t,  Xt)  is  a  solution  to  (44) 
widi  Xt  ^  0(>  0)  over  [to,  71-  Then.  X(t,  Xt)  satisfies 
the  Riccati  differential  equation 

-X  =  J)f^X  +  XJ^-X99^X+  9,  XiT)=XT 

(45) 

for  all  1 6  [to,  rj  where 

J(t)  =  9  +  Xit,XT)99'^X{t.XT). 

Note  diat  9(t)  is  nonnegative  definite  over  [to,  T].  Hence, 
[22,  Theorem  2.1]  shows  that  X(t,  Xt)  is  a  unique  and 
nonnegative  (positive)  definite  solution  to  (45).  n 

Hereafter,  we  assume  that  die  matrices  si .  9,  9,  and  9 
are  constant  matrices.  Then,  the  associated  algebraic  Riccati 
equation  is  represented  as 

{i^jJ^X-¥Xji- X{99^  -  99^)X+  9 .  (46) 

We  the  notion  of  a  minimal  nonnegative  definite 
solution  to  die  algebraic  Riccati  equation  (46)  (see  also  [11]). 

Definition  I:  A  nonnegative  definite  symmetric  matrix 
JK,  satisfying  die  algebraic  Riccati  equation  (46)  is  said  to  be 
the  minimal  nonnegative  definite  symmetric  solution  to 
(46),  if  given  any  other ^onn^ative  definite  symmetric  solu¬ 
tion  X2  to  (46),  then  X,  ^  X2. 

Lemina  2  follows  from  [14]. 

Lemma  2:  Suppose  that  there  exists  a  nonnegative  definite 


matrix  X^  satisfying  the  ARE  (46).  Then,  A'(f ,  0)  exists  for 
all  /  £  r  and  is  nondecreasing  as  t  d^rease.  Moreover, 
X(t,  0)-*X£Xiast-*-o»  where  X  denotes  the  mini¬ 
mal  nonnegative  definite  soluticn  to  the  ARE  (46). 

Lemma  3:  Suppose  that  for  0  ^  A’j-,  £  X-^i  there  exist 
X(t,  Xti)  and  X(t.  A'„)  over  [to.  T].  Then.  0  < 
X(t.  Xt,)  s  X(t,  Xt2). 

Proof:  Consider  a  value  functional 

/(«.  »v.  xo)  =  x(r)'’Ar„jt(r) 

+  l^{x^Qx  u^u  -  w^w}  dt 
•I  I 

with  a  differential  equation 

X  =  s/x  ■¥  9u-t-  9w,  Jc(0)  =  Xq. 

By  using  the  completion  of  square  argument  for  this  value 
fimetion^,  we  obt^  (see  [14]) 

/(m,  w,  Xo)  =  xlX{t,  Xt2)Xo 

-I-  rUu+  9^X{t,Xn)xf 

-\\w-  9^X{t,XT2)xV}dt  (47) 
and 

/(«,  w,  Xo)  =  xlX{t,  Xti)Xo 

+  xiTf{XT2-XT,)xiT) 

+  r{\iu+  9^Xit,XT,)xV 

•'1 

~^w-  9^X{t,XT,)xV]dt.  (48) 

Since  (47)  and  (48)  bold  for  any  u  and  w,  let  w  = 
-  9^X{t,  Xt2)x  and  w  =  9'^X{t,  Xt,)x.  Subtracting 
(48)  from  (47)  yields 

Xo{X(t,  Xtt)  ~  X(t,  A'7-,)}xo 

=  x(rf(XT2-XT,)x(T) 

+  fiilu  +  9^X(t.XT,)  xil^ 

+  11  w-  9^X(t,XT2)xn^}dt. 

This  completes  the  proof  since  Xt2  -  Xt,  2:  0  by  assump¬ 
tion.  D 

X{t,0)  is  nondecreasing  as  t  decreases  and  X(.t,0)  s 
XiL  Xt)  for  all  Jf  j.  fc  0  by  Lemma  3.  Hence,  it  is  clear 
that  if  X(t,  Xt)  does  not  exist  for  t  s  t^,  dien  it  increases 
widiout  bound  as  t  decreases  to  /g.  ^  other  hand,  if  we 
can  find  a  upper  bound  for  X{t,  Xt)  for  all  f  £  7*.  the 
solution  to  (44)  exists  since  it  can  not  escape  the  upper 
bound. 

Corollary  L  Suppose  that  there  exists  a  nonnegative  dtf- 
inite  matrix  X,  satisfying  the  ARE  (46).  If_0  s  Xt  S  X, 
then  X{t,  Xt)  exists  and  0  i  X{t,  Xt)s  Jf,  for  aU  /  ss 
T.  Moreover,  if  OsXt^X  where  .Y  is  the  minimal 
nonnegative  definite  solution  to  the  ARE  (46),  tiien 


ou 
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X(t,Xr)-*X  t-*  -00. 

Proof:  For  the  first  part  jt  suffices  X(t,  Xj),  if  it 

exists,  is  bounded  above  by  X^  Since  A',  »  X(t,  A',)  and 
0  S  Xf  £  A,,  it  follows  from  Lemma  2  and  Lemma  3  that 
0  s  A(/,  Xr)  S  A,. 

From  the  above  discussion  and  Lemma  3  if  0  2  Aj-  £  A, 
then  X{t,  X-j)  exists  and  A(/,0)  A(r,  Aj-)  s  A.  Since 
A(/,  0)  -*  A  as  t  -»  -  00,  A(/,  Aj.)  -» A  as  i  -  oo.  □ 

The  solutions  developed  in  Section  m-A  and  III-B,  de¬ 
scribed  by  (29),  (38),  (39),  (40),  and  (41),  require  the 
solutions  of  two  coupled  RDE’s  (16)  and  (31).  However, 
these  two  coupled  RDE’s  can  be  decoupled  into  two  indepen¬ 
dent  RDE’s  by  a  transformation. 

The  following  assumptions  result  fitmi  Section  m-A  and 

m-B. 

Assumption  1: 

a)  There  exists  a  solution  Pit)  to  the  RDE  (16)  over 

[0,  n. 

b)  p-\T)^eQT>Q. 

c)  There  exists  a  solution  Sit)  to  the  RDE  (31)  over 

[0,  n. 

Note  that  from  Lemma  1,  the  Assurrpdon  1-a)  assures 
Pit)  >  0  over  [0,  T],  and  the  assurrqitions  1-b)  and  1-c) 
assure  Sit)  2  0.  Moreover,  under  die  Assunqition  1  it  is 
easily  verified  that  n(r)  defined  in  (42)  satisfies  an  RDE 

-n  =  -h  n A  -  n(B/?-‘B^ Q, 

n(r)  =  Qr.  (49) 

Again  Lemma  1  shows  that  !!(/)  0  over  (0,  T]. 

Assumption  2: 

a)  There  exists  a  solution  Pit)  to  the  RDE  (16)  over 

[0.  n. 

b)  There  exists  a  solution  !!(/)  to  the  RDE  (49)  over 

(0,  T\. 

c)  p-\t)  -I-  fln(/)  >  0  over  [0,  T\. 

Claim  1:  Assumption  1  and  Assurrqition  2  are  equivalent. 

Proof:  Suppose  Assumption  1  holds.  Then,  Pit)  >  0 
and  11(0  ^  0,  hence  —  8S  >  0  over  (0,  T).  By  using 
the  matrix  inversion  lemma 

p-^  +  eii  =  -  flS)"'/*-*  >  0. 

Siqipose  Assumption  2  holds.  Then,  P(r)  -t-  011(7)  = 
PiT)  9Qt>  0.  It  can  be  verified  that  S~Tli/  + 
0Pn)~ '  satisfies  the  RDE  (31).  □ 

From  these  relations  and  by  solving  die  decoufded  RDE’s 
(16)  and  (49)  we  can  construct  the  controllo'  for  eadi  player. 
For  later  use  we  define 

Af^(/+8Pn)'^P.  (50) 

Then  Afit)  >  0  and  satisfies  die  RDE 

Ar»A/(A  -  8rivr^n)’'+  (a  -  erfpr^n)M 

-M(N’'y-'ff+enBX-*B^n)M+rwr’'  (si) 


with  A/(0)  «  (/  +  0/*on(O))"  '/*o.  It  is  noted  that 

A#=(/-0PS)P.  (52) 

D.  Saddle  Point  Strategy 

From  the  results  of  Section  m-B,  two  strategies  for  game 
problem  can  be  deduced  from  (29),  (38),  (39),  (40),  and 
(41). 

Strategy  1: 

u*  =  R-^BfSi,  w*  =  -ewT^nx, 

V*  =  0,  x*(0)  -  { /  -h  0Pon(O))  ■ ' Xo.  (53) 
Strategy  2: 

u*  =  -/l-'B'Sar,  w*  = 

V*  =  0,  x*(0)  -  0Pon(O)}‘'jCo. 

In  the  above  strategies,  x  and  St  denote  the  states  of  the 
dynamic  equation  (1)  and  the  state  estimator  (IS).  Note  that 
we  have  nuKle  u*  a  function  of  die  estimate  in  both  strate¬ 
gies,  but  w*  is  a  function  of  state  in  the  Strategy  1  and  a 
function  of  the  estimate  in  the  Strata  2.  We  will  show  that 
the  Strategy  1  forms  a  saddle  point  under  the  Assunqxion  1 
udiile  an  additional  condition  is  required  for  the  Strata  2  to 
produce  die  saddle  point. 

By  adding  the  zero  quantity 

0=  -^[x^nx]l+y^j^(x^nx)dt 
to  (6),  J  is  rqnesented  as  a  perfect  square  of  the  form 
J(u,  V,  w,  x(0))  =  x(0)  -  x(0)|lJ,TO 

+  7(ll»»'  +  »»»"r^nxBV-. 

V 

-l-BHl^.))*-  (54) 

Define  {  =  x  -  (/  -  $PS)2.  Then,  {  satisfies 

{  =  (A  -  ■¥  ePSBH 

+  Tir-  MH‘^V-%v  (55) 

with  the  initial  condition  ((0)  «  x(0)  -  {/-  0Po^(O)}-^o 
where  fi  =  «  -I-  R'^B^SSt  and  w  =  w  -»■  eWT^Sk.  Adding 

to  (6),  yidds  J  as 

/(«.v,w.x(0))  .  |||J?oBl(p,+ 


\ 

f 


I 


i 
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+i(ll^_  wr^p-^air-> 

+lir,if  +  iV{||V.)}rf/.  (56) 


Note  that  (54),  (55),  and  (56)  hold  for  any  strategy.  For 
OQovemeoce  let  u*,  w*,  uf,  and  x*(0)  denote  Strategy  1  and 
Strategy  2  is  indexed  ^  tte  subscript  2. 

Proposition  1:  Under  the  Assunpdon  1,  Strategy  1  fonns 
a  saddle  point,  that  is, 

y(«f ,  V,  w,  jf(o))  s  y(«r,  <,  wf ,  jrf(O)) 

sy(tt,u7,  wf,  Jff(0))  (57) 


fiH' an  tf,  V,  w€X.2(0> ^]>  xifSieR”. 

Proof:  If  u  =  uf ,  then  D  =  0.  Hence  from  (56) 
y(ii*,u,H-.x(0))  =  ^BJfolll<o,  +  ^B«(7')B5r-kn 

+lir,i;  +  f/{BV-«)*- 

Since  M~\T)  >  0  under  the  Assumption  1,  it  is  obvious 
diat 

J(ur.v.w.x(0))^jflJtoflU-  (58) 

From  (43)  and  (58),  we  obtain  the  left-hand  side  inequality  in 
(57). 

From  (54)  we  obtain 
/(«.v?.w*.  jff(O))  =  ^jB-^oBsi 

Jo 

Therefore 

/(ttf,  wj,  w*,  jrf(0))  =  jB-^olll(P)  s  A“*  »’?•  *f(0)) 

i^di  conq>letes  dw  proof.  □ 

Proportion  2:  Under  the  AssunqHion  1 

y(iij,  V,  w,  x(0))  i  /(«;,  uj,  wj,  xj(0))  (59) 


Proof:  The  [»oof  of  the  inequality  (59)  is  the  same  as 
diat  of  the  left-ha^  side  inequality  in  (57). 

Let  I*  denote  die  output  of  (55)  when  w*,  v*,  and  X2(0) 
are  apfdied  to  die  system  dynamics  and  the  state  estimator. 
Then,  from  (55)  satisfies 

{•  »  (>4  -  +  $PSBu,  (*(0)  =  0 

and 

7(ir,wJ,  wj,  xj(0))  =  ^B^oBI(0)  +  ^IU*(^)llM-(n 

+B^^{*lli^0)*-  («) 

If  there  exists  a  real  symmetric  solution  to  the  RDE  (60) 
over  [0,  7],  then  adding 

to  (62)  yields  7  as  a  perfect  square  of  the  form 
7(«,  vj,  wj,  xj(0))  =  ~B^oBs(0) 

2 

+  ^j^ii  +  jR~^B^SPf^*^  dt.  (63) 

Consequendy  comparing  (43)  and  (63)  yields  the  inequality 
(61).  □ 
Suppose  that  u,  v,  w,  and  x(0)  play  with  the  Strategy  2. 
By  taking  the  coordinate  transformation  x,  =  (/  -f 
6PTl)~^Jc,  we  obtain  the  following  strategy,  say  Strategy  3. 
Strategy  3: 

u*  =  -R-'B^T\x, 

w*  =  -dirr^nx,,  v*  =  o, 

X*(0)  =  {/-»-ePon(0)}''jCo  (64) 

vdiere 

i^  =  Ax^  +  Bu-  $TWr^nx^  +  MH^V-'iz  -  Hxf) 

(65) 


ft>r  aU  V,  ureLJiO,  7],  x(0)  eR".  In  addition,  if  there  exists 
a  real  symmetric  solution  to  die  RDE 

(>4  -  r(A  - 

-  B^PSBR-^B^SP^+  P-^TWT^P-^  +  H^V-'H 

(60) 

widi  die  final  oondidoo  ^(7^  s  Af~'(7^,  dien 
7(i^,  wj,  wj,  xj(0))  ^  7(«,  wj,  wj,  xj(0)), 

V«€L2l0,7).  (61) 


with  x^O)  =  (/+  fl7on(0))~'Jfo-  Since  Strategy  3  is  ob¬ 
tained  by  a  comdinate  transformation  of  Strategy  2,  it  can  be 
assumed  that  this  strategy  is  equivalent  to  Strategy  2.  How¬ 
ever,  Proposition  3  shows  that  diis  conjecture  is  false. 

Proposition  3:  Widi  Assunqition  1,  Strategy  3  satisfies 
the  saddle  point  condition 

7(m;,  V.  w,  x(0))  s  /(uf,  vf,  wj,  xj(0)) 

S7(ir,wj,wj,xj(0))  (66) 
for  all  ti,  V,  weLjlO,  7],  x(0)€R". 


I 
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Proof:  Pint,  prove  the  left  side  inequality.  If  u  plays 
St  with  die  Stnlegy  3,  then  it  can  be  verified  that 

jr,=  (/  +  tfPn)‘'je  (67) 

r  any  v,  w,  and  x(0).  Therefore,  u*  =  u*  since  S  -  11(7 

«pn)-‘. 

Next,  the  right  side  inequality  of  (66)  is  proved.  Suppose 
;  V,  and  x(0)  play  first  with  Strategy  3.  If  u  does  not 
ay  its  saddle  point  strategy,  then  (67)  does  not  hold, 
owever,  coaqnring  (1)  to  (65),  we  obtain  diat  x  -  for 
u,  hence  »  wf .  From  this  consideration.  Strategy  3 
equivalent  to  Strategy  1 .  □ 

In  controller  (IS)  and  (53),  die  worst  case  disturbances  w* 
nd  x*(0)  are  not  explicit  as  they  are  in  controller  (64)  and 
iS).  Clearly,  if  w,  v,  and  x(0)  play  Strategy  3,  then  by 
hsoving  (65),  the  error  x  -  x,  =  0  even  if  u  does  not  play 
pdmally.  In  contrast,  the  error  f*.  defined  as  f**  =  x  -  (/ 
h  9Pn)~'j^  and  propagated  by  (55)  where  w,  u,  and  x(0) 
lay  Stoategy  2,  is  nonzero  if  u  ^  0.  This  is  because  (67) 
nly  holds  when  u  plays  its  saddle  point  strategy. 

If  u  is  confined  to  be  a  linear  fiinction  of  Z,,  then  the 
oUowing  lenuna  holds. 

Lemma  4;  There  exists  a  ue  such  that  J(u,  v,  w, 
(0))  is  stricdy  coix»ve  with  respect  to  (u,  w,  x(0))  if  and 
nly  if  Assumption  1  or,  equivalendy.  Assumption  2  holds. 
4oreover,  whn  the  assumption  holds,  fi  =  is  a 

ontrol  which  makes  J(u,  v,  w,  x(0))  stricdy  concave  with 
e^>ect  to  (v,  w,  x(0)). 

Proof:  See  appendix.  □ 

Since  tte  transformation  (67)  is  valid  for  any  v,  w,  x(0) 
br  «  =  S  «  -/?~'B^nx,  is  equivalent  to 

i  = 

As  seen  in  Propositions  1  and  2,  for  a  given  u  e  which 
ludces  J(u,  V,  w,  x(0))  stricdy  concave  with  repect  to  (v, 
IV,  x(0))  we  can  find  many  optimal  strategies  (v*,  iv*,  x*(0)) 
>f  (v,  IV,  x(0))  which  satisfies 

7(5.  u,  w,  x(0))  s  y(5,  u*,  IV*,  x*(0)), 

Vu,  IV  6  £2(0,  T],  x(0)6/?". 

flowever,  all  optimal  strategies  produce  the  same  optimal 
lajectory,  that  is,  the  optimal  trafectory  is  unique.  Therefore 

J(5,  V,  IV,  x(0))  <  7(5,  V*,  w*,  x*(0)) 

for  all  VjWeLjlOfTJ,  x(0)eJf"  such  that 
f(0))  (w*(0.  ^*(7),  x*(0))  for  aU  7610, 7]. 

E.  The  Deterministic  Linear-Quadratic  Game  and  the 
LEG  Stochastic  Control  Problem 

The  solution  of  the  deterministic  linear-quadratic  game 
noblem  considered  here,  in  particular  (IS),  (16),  (29),  and 
3I).  is  equivalent  to  the  solution  of  die  linear-exponential- 
jaussian  problem  [16] 

oainE  -eexpi^-e^J^{x^Qx  u^Ru)  dt 

+x{TfQrx(T)^]^ 


subject  to  the  stochastic  linear  system 

x  =  Ax  +  BM  +  riv, 
z  *  Hx  +  r,u 

where  £[■]  denotes  the  expectation  operator.  x(0)  is  nor¬ 
mally  distributed  with  mean  Jig  and  covariance  Pg,  and  w(t) 
and  v(7)  are  jotndy  Gaussian  independent  white  noise  pro¬ 
cesses  with  statistic 

£(iv(7)]  =  0.  £[Hr(7)iv(r)^]  =  If'(7)«(7  -  r). 

£[w(7)]  =  0.  £[v(7)v(r)^J  =  *'(7)6(7  -  t) 

where  5(7  —  r)  denotes  the  Dirac  delta  fiinction.  It  should  be 
noted  diat  the  conditions  for  optimality  determined  here  are 
not  die  same  as  the  conditioos  that  are  required  for  the 
solution  in  [16].  However,  the  conditions  in  [17],  [18]  do 
reduce  to  those  given  here. 

rv.  Fintte-Time  Interval  Disturbance  Attenuation 
Pr(»lem 

In  diis  section,  the  finite-time  interval  disturbance  attenua¬ 
tion  problem  is  solved  by  using  die  results  in  Section  m. 

Let  Jg  denote  7  with  finite-time  disturbance 

attenuation  problem  (4)  is  equivalent  to  finding  u  e  4',  satis- 
iyii% 

7o(ii,  V,  w,  x(0))  <  0 

for  all  V,  weLjlO.T],  x(0)e/?",  such  diat  (v(7),  iv(7), 
x(0))  0  over  all  7  e  [0,  T]. 

Theorem  1:  There  exists  a  solution  ue  to  the  finite¬ 
time  disturbance  attenuation  problem  (4)  if  a^  only  if  As¬ 
sumption  1  or  2  bolds.  If  the  assumption  holds,  u  ^ 
-R-'B^Si  =  -£-'£'‘nx,  is  a  solution. 

Proof: 

Sttffkiency:  Suppose  that  assumption  1  or  2  holds.  Then, 
for  u  =  wf  =  -R~*B^Si,  fiom  Proposition  1 

7g(«f,  V,  IV,  x(0))  s  7o(r/f ,  iv^,  xf(0))  =  0 

for  all  w,  iveLjIO,  T\,  x(0)e£".  Since  Xg  =  0.  **(0)  =  0- 
Hence,  for  u*,  v*,  w*,  and  xf(0),  equations  (1)  and  (15) 
become  homogeneous  equations  with  zero  initial  conditions, 
which  leads  x(7)  =  x(7)  =  0  over  [0, 7]  and  (vt(7),  wf(7), 
x*(0))  =  0  over  [0, 7J.  From  Lemma  4,  7g(w*,  v,  w,  x(0)) 
is  stricdy  concave  with  reqiect  to  (v,  w,  x(0)),  hence 

7g(«T.«'.’^.Jf(0))  <0 

for  ail  w,  IV6L2IO.  71,  x(0)6/?"  such  that  (v(7),  iv(7), 
x(0))  ^  0  for  all  7  e  [0, 7]. 

Necessity:  Suppose  that  ue  *tf,  is  a  solution  to  (4).  Since 
56  4';,  Jg(u,  V,  IV,  x(0))  is  a  quadratic  fiinction  widi  respect 
to  (0,  w,  x(0)).  For  (v(7),  iv(7),  x(0))  =  0  over  [0, 7],  z(7) 
=  ffx(t)  from  rriiicfa  the  equation  (1)  with  u  -  u  becomes  a 
bmnogeneous  equation  with  zero  initial  condition,  and  5(7) 
=  0  over  [0, 7].  Tberefore, 

7g(5,  V.  IV,  x(0))  s  7g(5. 0,0,0)  =  0 

Vv,  iveL2(0,7l.x(0)€£" 

and  (v(7),  iv(7),  x(0))  «  0,  7€[0, 71  is  a  unique  extremal 
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tnjectofy.  dut  is,  Joiu.  v,  w,  jr(0))  b  strictly  concave  with 
respect  to  (v.  w,  Jr(0)).  Applying  Lemma  4  completes  the 
proof.  □ 


V.  Time-Invariant  Controller 

In  thb  section,  we  assume  that  the  systems  (1),  (2),  and  (3) 
are  time-invariant  systems  with  zero  initial  condition;  that  b, 
A,  B,  r,  H,  r,,  C,  and  C,  are  constant  matrices.  It  b  also 
assumed  that  all  weighting  matrices  in  (6)  are  constant,  in 
particular,  W  and  V  are  identity  matrices.  It  b  also  assumed 
that  (A.B)  and  (A.T)  are  subilizable  and  amtroUable 
pairs,  respectively,  and  (H,  A)  and  (C,  ^4)  are  detectable 
pairs.  A  disturbance  attenuation  problem  for  thb  system  has 
been  solved  in  [1]  based  on  two  ARE’s  associated  with  the 
RDE’s  of  (16)  and  (49).  In  thb  section,  all  gwhitiring  com¬ 
pensators  which  can  be  constructed  from  the  solutions  of  two 
ARE’s  satbfy  an  //.  norm  bound.  ^ 

Siqtpose  ttot  dMie  exists  a  nonnegadve  definite  f!  and  a 
positive  definite  P  satisfying  the  following  ARE’s 

0  =  A^n  +  UA-  +  firr^)n  +  c^c  (68) 

0  =  AP  +  PA^-P{H^V-'H+eC^C)P^rr^  (69) 
such  that 

p-‘  +  fln>o.  (70) 

By  Coitrilary  1  th£  solution  n(/)  to  the  RDE  (49)  converges 
to  n„,  it  Qj.^  n„  where  11,,  b  the  minimal  nonnegadve 
definite  solution  to  the  ARE  (68).  The  solution  to  RDE  (16) 
has  similar  properties. 

As  r-*  ee,  the  compensator,  described  by  (64)  and  (65) 
becomes  a  time-invariant  contndler  of  the  form 

if  =  AfXf  +  BfZ 

u  =  CfXf  (71) 

where 

A,  =  A  -  BR-'B^n  -  -  err^fi 

Cf=  -R-'B^n 
Af  =  (/-»-6^)"‘P>0. 

Note  that  we  can  also  obtain  a  time-invariant  controller  fixHn 
(15)  and  (53)  which  can  be  transfonned  to  (71)  by  taking  a 
coordinate  transformation  =  (/  +  0Pn)~'Jr.  In  general, 
there  can  be  more  than  one  ncMuiegative  definite  solution  to 
(68)  or  (69).  Therefore,  we  can  construct  more  titan  one 
time-invariant  controller  from  (68)  and  (69).  It  b  noted  that 
Af  b  a  positive  definite  solution  to  die  ARE  resulting  from 
the  RDE  (51) 

0  =  Af(A  -  0rr*'n)^+  (a  -  9rr^n)M 

-  MiH^p-^H  +  9nBR-^B^n)Af  TT^.  (72) 

By  using  the  time-invariant  coitintiler  (71)  the  closed-loop 
^stem  becomes 

i  =  A^x  +  TdiP 

y  =  (73) 


where 


A 

BfH 


BCf 


0 

c.c. 


The  transfer  function  from  the  disturbance  w  to  y  denoted 
by  is  given  as 

Preposition  4:  The  closed-loop  system  (73)  b  stable  and 


(74) 


Ciaim  2;  A  -  b  a  stable  matrix. 

Proof:  Rewrite  (72)  as 

Af( A  -  MH^P-^H  -  firr^n)^ 

-»-(A  -  MH^P-^H  -  9TT^n)M 
=  -{Af(/f»'K-'/f-0nBB-'B*’n)Af  +  rr^l 
4 -V. 

Since  (A.T)  b  controllable,  [A  -  fiFr^Il^l  i^control- 
lable.  _From  [22,  4.1],  \A  -  - 

err^n,  +  rrO'^  b  mttroUable.  By  s>- 

]d)ring  the  lemma  again  [A  -  Af/f’‘K“'i/ -  firr^II, 
f/'P]  b  controllable.  Since  Af  >  0  b  the  solution  of  above 
Lyapunov  equation,  the  claim  b  oonqtieted  by  using  [22, 
Leinma4.2].  □ 

Proof  of  Proposition  4:  It  can  be  verified  that  JT 
defined  as 


■r-  -Af-' 

l-Af-'  Af-' 


satisfies  the  ARE 


A5jr+  JTAd  =  -XTJ'lX-i-  9ClC^.  (75) 


Since  (C,  A)  b  detectable,  there  exists  a  L  such  that  A  - 
LC  b  stable.  For  thb  L 


-^d 


L 

0 


BR-^Cl 

BR-^CJ 


A-  LC  _  _0 

BfH  A  -  MH^P-^H -  9rr^n 


Rdiich  from  Claim  2  implies  tiut  (C^,  A.,)  b  detectable. 
(22,  Lemma  4. 11  shows  that  [yrr^rijr-  «<^Cd)'^.  AJ 
b  dectectable.  Observe  tiiat  P~ '  »  M~ '  -  011,  hence 


Therefore,  it  follows  from  [22,  Lemma  4.2]  that  Ag,  is 
stable. 

The  H„  bound  can  be  deduced  fran  die  inequality  (57)  or 
(66)  as  in  [7]  which  considered  the  state  feedback  case. 
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ittkkr  the  case  where  *  0.  Pq  ~  Qr  ^ 
control  law  (71)  is  equivalm  to  the  control  law  for  u  in 
at^  3.  When  the  controller  (71)  is  used,  from  the 
t-ha^  side  inequality  of  (66) 

y(«j.  v.  w.O)  sO,  Vw,  ueLjfO,  r].  (76) 

Meover,  since  the  closed-loop  (73)  is  stable,  as  T  goes  to 
Inity 

»,  Jf,(/),M(/),^(/)6Z,j[0.O6)  vw,  ueLifO.oa) 

an  which  x(oe)  *  0  Vw,  veL}[0,  os),  t.^nce,  frxrm  (76) 
t  obtain 

•  If" 

y^ydt  ^  —  I  {w^w  -t-  v^v)  dt 

I  ®  •'0 

vw,  ueL^fO.os) 

ikh  is  equivalent  to  (74).  _  _  □ 

Note  thtt  for  all  solutions  n  2  0  and  P  >  0  to  (68)  and 
9),  reflectively,  such  that  P~'  +  011  >  0.  Proposition  4 
Ids.  If  we  take  ^  minimal  nonnegative  and  positive 
finite  solutions  as  11  and  P,  then  the  time-invariant  con- 
iller  (71)  is  equivalent  to  the  /f.  controller  proposed  in 
1. 

Example:  Consider  evaluating  the  contndler  (71)  for  the 
alar  system 

i  a  - 1 .5jf  +  M  -f  w 
1 

2  s  X  4-  — - — 1« 
vl4 

Qa4,  P  =  2.  0  =  -1. 
lie  corresponding  ARE’s  are 

-an  +  o.sn’  +  4  =  o-»ii  =  2,4 
-3P  -  lOP  -I-  1  a  0  •  P  =  0.2. 

'e  obtain  two  positive  solutions  for  n  which  satisfy  the 
equality  (48).  Therefore,  two  controUers  can  be  ctm- 

pSlh.  P)  =  (2,0.2) 

i:,=  -5.167* -f4.667zj 

u  =  —Xg  j 

For  (fi.P)  a  (4,0.2) 

*,=  -13.5jf-l- 14zj 
u  —  —2*,  j 

g.  1  dqiicts  the  largest  singular  value  of  the  Ty^ija)  of  the 
3sed-loop  system  using  the  controUer  (*)  or  (**),  and 
ows  diat  die  two  controllers  satisfy  the  /f«,-norm  bound  of 
,ip.  In  this  exanqile  as  0  increases  in  a  n^ative  way,  we 
a  to  obtain  a  positive  solution  to  the  n  equation  for 
<  —17116.  This^occurs  when  the  Hamiltonian  matrix  as- 
dated  widi  the  n  equation  has  two  eigenvalues  at  the 
igin. 

VI.  Conclusions 

A  fittite-time  disturbance  attenuation  problem  was  analyzed 
’  a  game  dieoretic  qiproadi.  By  adopting  a  calculus  of 


ttCmVHc) 

Fig.  1.  Ltfgatt  inettlar  value  plot. 

variation  technique  to  solve  a  linear-quadratic  differential 
game,  an  extremely  straightforward  derivatimi  is  obtained  for 
a  linear  n-dimensional  compensator.  A  solution  to  a  finite¬ 
time  disturbance  attenuation  proUem  was  obtained  by  using 
die  results  of  a  LQ  game  problem  with  partial  information. 
The  resulting  compensator  can  be  time-varying,  therdiy  gen¬ 
eralizing  the  results  of  [1].  The  generality  of  the  problem 
fwmulation  and  the  simplidty  of  its  solution  is  due  to  the  use 
of  a  state  qiace  rather  t^  a  frequency  domain  approach.  In 
Section  m-D  it  is  shown  that  this  approach  suggesu  that 
there  could  be  more  than  one  saddle  strategy.  Three  strate- 
1^  were  considered  and  sufficient  conditions  are  given  for 
vdien  they  satisfy  the  saddle  poitt  cooditioa.  Furthermore,  it 
is  shown  that  there  exists  a  S  €  which  makes 
J(u,  w,  V,  *(0))  stricdy  concave  if  and  only  if  Assumption  1 
or  2  is  satisfied.  Also,  die  controller  w*  given  in  Section 
m-C  is  identical  to  diat  of  the  LEG  problem  [16]  excfX  that 
certain  conditions  found  in  [16]  are  different. 

By  specializing  to  time-invariant  systems  and  infinite-time 
cost  criterion,  a  time-invariant  controller  results.  To  do  this 
we  studied  die  properties  of  the  two  RDE’s  obtained  in 
Section  m-A  and  ffl-B,  and  decomposed  their  associated 
ARE’s  in  Sectimi  m-C.  It  was  shown  that  there  can  be  more 
than  one  nonnegative  definite  solution  to  the  ARE  and  that 
the  norm  given  by  (74)  is  satisfied  by  all  compensators 
constructed  from  all  of  them  vriiicb  satisfy  certain  conditions. 
It  is  suggested  that  die  minimul  nonnegative  definite  solution 
to  die  ARE  which  has  a  strong  stdution  [11]  be  used  to  design 
the  compensator.  Finally,  we  gave  a  direct  proof  that  the 
LEG  and  differential  game  contnffiers  satisfy  ^  //.  bound 

PI. 

ApPENtMX 

For  a  given  B  e  let  J(v,  w,  x(0))  »  J(S,  v,  w,  *(0)). 
The  second  varatioo  along  the  extremal  path  is  given  as 

~[  + 1 
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Taking  a  first  variation  of  (1),  (2),  and  (IS)  for  given 
u  =  u,  we  obtain 

Si:  =  fix  +  J565  +  rSw,  (77) 

5z  =  //Sx  +  r,Su  (78) 

si  =  y*  Sjc  +  BSC  +  P/f^K-'(Sz  -  H Sjc)  -  flPQSJc. 

Sx(0)  =  0  (79) 

Note  fiiat  (79)  is  defined  over  the  interval  where  Pit)  exists. 
Define  e(t)  =  xit)  -  x(t).  Then,  Se  satisfies 

Se=  {A  -  PH'^V-^H)  6e  -  T iw  PH'^V-^T^Sv 

-ePQbx  (80) 

with  a  initial  condition  e(0)  =  —  Sx(0). 

Claim  3:  Suppose  that  Pit)  exists  over  [0,  /,]  and  ue  ‘H?, 
is  given.  If  So  =  Ff  '/fSe  over  (0,  t,],  then  Sjc(r)  =  0  and 
Suit)  =  0  over  [0,  /,].  In  addition,  if  Sw  is  linear  with 
respect  to  Se,  then  any  nonzero  Se(t2).  ^2  ^  ^ 

produced  with  an  appropriate  choice  of  nonzero  Sjy(0). 

Proof:  If  Su  =  Ff'/fSe,  then  Sz  =  HSx  +  F, So  = 
Hhx.  Equation  (79)  for  given  u  =  u,  therefore,  becomes 

Si  =  /*Sjc  +  BSC~  dPQSjr,  Sjc(O)  =  0 

over  [0,  t|].  Since  Sz  is  a  function  of  only  Sx  and  C  is  linear 
with  respect  to  Z„  SJcit)  -  0  and  Suit)  =  0  over  [0,  /iJ. 
Equation  (80)  becomes 

£e  =  ASe-rSw,  Se(0)  -  -Sx(0),  /e[0,f,]. 

If  Sh>  is  linear  in  Se,  then 

Se(0=  -*,(f,fo)MO) 

where  is  the  transition  matrix  of  resulting  linear  system. 
This  complete  the  proof.  □ 

Proof  of  Lemma  4: 

Sufficiency:  Suppose  the  Assumption  1  holds.  Consider 
C  =  -R~  Then  SC  =  -R~  'B^S  Sx.  Ina  similar  way 

in  (56),  we  obtain 

•p-'S{ll?K-.+ IlF.Su  +  i/Sllli^.}  dt 

where 

si  =  (A  -  S{  +  FSw  -  Afff^p-'F,  Sv, 

S{(0)  =  Sx(0). 

The  equality  holds  only  for  S{(7^  =  0,  Siv  =  ITF^P”*  S{ 
and  F,  Sw  *  -HS^.  For  these  values 

si  =  (>i  +  FW'Frp-')  S€,  S€(r)  =  0 

which  implies  6{(0  =  0  over  [0,71,  hence  6u(0  =  0, 
Swit)  =  -eiW^SSHt)  over  [0,  T]  and  Sxp)  =  0.  From 
(77)  and  (78),  SJt(0  =  0.  Therefore,  S^J  =  0  only  if 


(Sv(0.  Swit),  SxiO))  =  0  over  all  f  €  [0,  T\,  from  which 
S^7>0 

for  all  Sv,  Sw,  Sjf(O)  such  that  (Sv(/),  Sw(/),  Sj:(0))  0  for 

all  /6[0,  T],  that  is,  7  is  strictly  concave  with  respect  to 
iv,  w,  x(0)). 

Necessity:  Suppose  that  for  a  C  e  ‘I!'/,  7  is  strictly  concave 
with  respect  to  (u,  w,  x(0)).  Since  7  is  a  quadratic 
function,  the  second  varation  of  7  along  the  extremal  path 
S^7  should  be  negative  for  all  Su,  Sw,  Sx(0)  ^  0  such  that 
(Su(/),  Sw(0,  6x(0))  0  for  all  f  €  [0,  T\. 

a)  Suppose  assumption  (2a)  is  violated.  Let  /,  6  (0,  T]  be 
the  escrqx  time  to  RDE  (16).  Then,  for  some  nonzero  vector 
Pi,  fi{P~\t)P\  0  as  /(</,)-►  Adding 

0  =  ^[Se^(eP)-'  Scl^'  -  j  J‘‘j^{Se^{eP)-'  Se}  dt 

where  0  s  /,  <  r,  to  S^7  yields 

2S^7=  T'dlSJcll^e+llSCf^ 

•'0 

+  ||5w  +  SlFF^p-'Se||j',^)-. 

+  ||F,  Su  -  //Se||?,p,-.}  dt  +  ||6e(Oli(W.„-. 

where 

■'t. 

Choose 

Sw  =  S»^F^p-'Se,  Su  =  -Fj-'i/Se;  for0sr<r, 
Sw  =  0,  Su  =  0;  for  s  /  s  7. 

From  Claim  3,  Sx(r)  =  0  and  SC(r)  =  0  over  [0,  /,],  and 
Sx(0)  ^  0  can  be  choosed  as  Se(/,)  =  p,.  Hence 

26^7=l|pJf,p„,„-.  +  ||Sx^(7)|l|^ 

+  /  (ll«Jfllo+ 

As  t,-*  f,,  S^7  a  0  which  is  a  contradiction. 

b)  Sujqrose  that  assumption  2a)  holds,  but  Assumption  2b) 

is  violat^.  Let  t2  e  [0,  7)  be  the  escape  time  to  RDE  (49). 
Then  for  some  nonzero  vector  pj,  p[nit)p2  f(> 

/,)  -♦  f,.  Adding 

o=+^[sx^nsx]l-^fj^{Sx^nsx}dt 

where  t2^  t^^  T  to  S^J  yields 
2S*7=|Sx(0)llf, +  ||Sx(/,)|| 

no,)  +  mts) 

+  /^{||s«  +  p-'B^nsx|ii 

■'i. 
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+ll«w  +  dt. 


Choose 

6w  =  0,  5i)  =  r,"*/f5c;  forO:S/£/j 
«w=  «u  =  0;  foT  t,<t£T 

and  ix(0)  ^  0  as  5x(r^)  =  pj-  Hence 

^  { ||ax(0)||5.-.  +  ^f~'HSedj 

+  II  ^2 II  nop- 

The  right-hand  side  term  can  be  positive  by  letting  -* 
which  is  a  contradiction. 

c)  Suppose  the  Assumptions  2a)  and  2b)  hold  but  the 
Assumption  2c)  is  violated  at  /  =  /j,  0  £  /j  ^  T.  Then,  for 
some  pj  0,  P3{P“'(/3)  +  fln(/3)}p3  :S  0.  Choose 

«w=  «u  =  rf‘i/6p;  for0</<;r3 

iw  =  -d  6x,  5u  =  0;  for  t^<t^T 

and  dx(0)  ^  0  as  Sxftj)  =  Pj.  From  Claim  3,  5S:(t)  =  0 
and  du(/)  =  0  over  [0,  ^3].  By  identifying  t,  in  b)  and  c) 
as  /3 

26^J=^P3^{p-'(f3)+fin(/3)}p3 

+  f^\\Su  + R-'B^U6x\\]idt^0 

■’h 

which  is  a  contradiction.  □ 

Remark:  Similar  results  have  come  to  our  attention  are 
given  in  [8],  [9]. 
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Application  of  a  Game  Theoretic  Controller  to  a 

Benchmark  Problem 
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Tie  twe  Uwofrtlc  cortfollcr  vrUevt  attwcfit  It  tdtotfaJ  to  iRmI  of  belli  the  Vmtu  Ci—lSR  — d 

tic  M.  coRtroHcr  It  appUcd  to  the  problem  of  CMlroUat  •  naaMpitet  lYetcM  tlRt  approiiMiet  He  dYRMRlcs 
of  a  lleiiblc  ftractarc.  By  eiewtag  tie  pfawl  parameter  eariaiioB  M  H  latcfMi  fecdlMfc  loop,  plaat  wecrtatatict 
of  tic  eyitem,  iapat,  aad  oatpat  autifeec  caa  be  aecempoied  lato  a  fletMoae  iapat/oatpal  tyttem  witl  aa- 
kaowa  gaiat.  Tlcae  flctUioai  lapat/oalpat  diicctioas  dac  to  paiaawtcr  aaccrtalaty  ara  aaed  bi  eoaetractlac  tbc 
galas  for  tbc  gaam  thcorctie  coatroUcr.  Tbc  lenritiag  coatrol  rcdaccs  lie  effect  of  paraaaeter  aaccrtalaty  oa  the 
system  perfonsuaee. 


I.  latrodoctioB 

SYNTHESIS  procedure  is  described  for  the  design  of  a 
state  feedback  control  law  for  a  linear  time-invariant  sys¬ 
tem  in  the  presence  of  parameter  uncertainty  in  the  system, 
input,  and  output  matrices.  The  parameter  uncertainty  is 
modeled  via  an  input/output  decomposition  procedure.'*^  A 
differential  game  approach  has  been  taken  for  this  problem  in 
Ref.  3,  where  the  parameter  uncertainty  was  not  decomposed 
and  oiily  the  uncertainty  in  the  system  matrix  is  considered.  In 
Refs.  4^  the  Lyapunov  stability  theory  has  been  used  to 
design  a  control  law  for  a  system  with  uncertainty.  In  Refs.  1 
and  2,  by  adopting  an  input/output  decomposition  of  the 
parameter  uncertainty,  the  uncertain  system  is  represented  as 
an  internal  feedback  loop  (IFL)  in  whi^  the  parameter  uncer¬ 
tainty  is  embedded  in  the  system  as  a  fletitious  disturbance. 
Tahk  and  Speyer'*^  developed  the  parameter  robust  linear 
quadratic  Gaussian  (PRLQG)  synthesis  procedure,  which  is  an 
LQG  design  based  on  an  extension  of  loop-transfer  recovery 
for  the  IFL  description.  In  Refs.  1,  2,  and  6,  the  system  is 
augmented  to  accommodate  the  input  and  output  matrix  un¬ 
certainty.  In  this  paper,  by  considering  the  input  and  a  fleti¬ 
tious  input  in  the  IFL  description  as  two  nonoooperative  play¬ 
ers,  a  finite-time  linear  differential  game  problem  is  con¬ 
structed  based  on  the  results  of  Ref.  7.  By  taking  the  limit  to 
an  inflnite-time,  time-invariant  linear  system .  a  time-invariant 
control  law  is  obUuned.  It  is  shown  that  the  resulting  time-in¬ 
variant  controller  stabilizes  the  uncertain  system  for  a  pre¬ 
scribed  parameter  uncertainty  bound.  These  results  are  pre¬ 
sented  in  Sec.  II. 

This  approach  is  applied  in  Sec.  Ill  to  a  benchmark  problem 
composed  of  two  masses  and  a  spring  with  an  unknown  spring 
constant.  The  input  is  applied  to  the  first  mass  and  a  noisy 
measurement  is  made  of  the  position  of  the  second  mass. 
Furthermore,  a  harmonic  forcing  function  of  unknown  ampli¬ 
tude  and  phaM  is  applied  to  the  second  mass.  The  objective  is 
to  regulate  the  second  mass  about  the  zero  position  given  the 
assumed  uncertainties.  A  robust  compensation  is  determined 
that  has  four  nonminimal  phase  zeros. 
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II.  Game  Theoretic  Cootroller 

A  controller  for  a  linear  time-invariant  system  with  parame¬ 
ter  uncertainties  in  the  system,  input,  and  output  matrices  is 
derived  via  the  differential  game  framework. 

Consider  a  time-invariant  system  with  uncertainties  in  sys¬ 
tem,  input,  and  output  matrices  described  by 

x  =  (Ao-l- Ai4)x -I- (Bo-f  AR)u  (1) 

Z«(//o+AJ/)*  (2) 

wherex,  H,  and  t  denote  the  state  vector,  the  input  vector,  and 
the  measurement  vector,  respectively;  Ao,  Bb>  and  Ho  denote 
the  nominal  system  matrix,  the  nominal  input  matrix,  and  the 
nominal  measurement  matrix  with  suitable  dimensions,  re¬ 
spectively;  and  AA ,  AB,  and  AH  are  perturbations  of  the 
system  matrix,  the  input  matrix,  and  the  measurement  matrix, 
respectively,  due  to  parameter  variations.  It  is  assumed  that 
CAo,  is  a  stabilizable  pair  and  {Ho.  Ag)  is  a  detectable  pair. 

By  adopting  the  input/output  d^mposition  modeling' of 
the  perturbations,  AA ,  AB,  and  AH  are  represented  as 

AA  -  DL,{t)E,  AB  =  FL»(*)C,  AH  =  YLgWZ  (3) 

where  <  denotes  the  parameter  variation  vector,  which  is  con¬ 
stant  but  unknown,  and  all  other  matrices  are  known  constant 
matrices.  The  elements  of  <  need  not  be  independent  of  each 
other. 

A.  Stole  Feedback 

In  this  subsection  all  sutes  are  assumed  to  be  perfectly 
measured,  and  the  control  u  is  restricted  to  a  sute  feedback, 
i.e., «  ««(x). 

With  the  plant  perturbation  modeling  given  by  Eq.  (3),  the 
uncertain  dynamic  system  [Eq.  (1)]  can  be  represented  as  an 
internal  feedback  loop'*^  in  which  the  system  is  assumed  to  be 
forced  by  fletitious  disturbances  caused  by  the  parameter  un¬ 
certainty: 

ir  -  Agx  +Biiu+  Tffj  (4) 


Wf  »  Kcby  (6) 

where  T/  *  [DF],  wy  is  the  fletitious  disturbance,  and 
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Consider  a  quadratic  performance  index, 

2  Jo 

where  7*  is  a  fixed  final  time,  and 


Assume  that  all  admissible  parameter  variations  are  character¬ 
ized  as 

^yjLm)L{n)yJ At  ^  ^wJwfAt  , 

*  ^yjy/^t 

where  >  is  a  positive  constant.  Note  that  IZ.(()I,  s  y  for  all 
admissible  parameter  variations,  where  I  •  I,  denotes  the  spec¬ 
tral  norm. 

For  a  given  control  law  u  =  «(x),  the  performance  index  J 
achieves  its  maximum  when  the  parameter  variations  are  the 
worst  case  for  the  control  u.  The  worst  case  occurs  when  ny 
uses  all  of  the  available  control,  i.e.. 


Consider  a  control  law  that  minimizes  J  for  the  worst  case  ny. 
Then  a  game  situation  arises  such  that 

min  max  7 

«  m/ 

subject  to  Eqs.  (1)  and  (7).  Adjoining  the  coiutraint  (7)  to  the 
performance  index  J  yields  the  problem  of 

I 

min  max  -  1  [p^y^y  +  (y/jy  -  y*  uyuy)]  dr  (8) 

a  ay  2  Jo 

subject  to  Eq.  (4),  where  p  is  a  constant  to  be  determined  by 
trial  and  error  to  satisfy  Eq.  (7).  It  is  well  known’-*  that  if 
there  exists  a  real  symmetric  solution  Ilfr)  over  the  interval  r  € 
[0,  //]  to  the  Riccati  differential  equation  (RDE), 

-n = Ajti  +  iL4o  -  n(RoR  ■'fifl’  -  T^/T/jn  +  a 

with  the  final  condition  !!(//)  =  0,  where 

Q,~l^C^C  +  E^E,  R^f^ClC,  +  G^G 

and  R  is  assumed  to  be  positive  definite,  then  the  optimal 
strategies  «*  and  w*  for  u  and  uy,  respectively,  are  given  as 

w^^y^rjmox 

For  the  case  where  T— ob,  if  there  exists  a  nonnegative  def¬ 
inite  solution  to  the  algebraic  Riccati  equation  (ARE), 

0~AlTl,  +  IUAo-Il,{BoR-'Bl-y^r^Tj)JU  +  Q.  (9) 


Fig.  1  Mali  ipriag  systMi. 


then  n(r)  converges  to  the  minimal  nonnegative  definite  solu¬ 
tion’  to  the  ARE  (9).’-’  HesR,  «*  and  wj  become  time-invari¬ 
ant  strategies  described  by 

(10a) 

(10b) 

where  1),  it  the  minimal  ntmmgative  deflnite  solution  to  the 
ARE  (Eq.  (9)1. 

In  the  worst  case  design,  ance  the  fictitious  disturbance  uy 
is  not  an  intdligent  player,  only  the  conuol  strategy  for  the 
control  u  given  by  Eq.  (lOi)  can  be  implemented. 

Claim  1.  Suppose  that  +  >  Oandlet  Hi  a/uf  H] 

be  arbitrary  positne-t^fbale  matrices  with  suitable  dimen¬ 
sion.  Then 

S>’H,a>+9"U,fi  >  0 

Proof.  It  it  sufficient  to  prove  that  S’HilD-f  ia 

nonsingular.  Suppose  that  thim  exisu  a  nonzero  t  such  that 
z’^(S*^HiS>-i-G^‘U29)z  -0.  Then  a>z  -0  and  gz  sO  since  H| 
and  Hj  are  positive  definite;  hence,  (S>n>-t-9^S)t  ~0,  which 
contradicts  the  assumption.  ■ 

Claim  2.  Ut  a)*S>+ 9*6 =5*^5  and  let  I»*^1>  +  9’H9 
> 7|9i,  where  *U  tt  on  arbdnry  podtive-difinite  matrix  with 
a  suitable  dimension.  ffiS,  ft)  is  detectable,  then  (7|,  Q.  +  9C9) 
is  detectable  for  all  9C  with  autabie  dimensions. 

Proof.  Suppose  that  (9|.  6  -f  9C9)  is  not  detecuble.  Then 
there  exists  a  nonzero  vector  t  for  tome  s  in  the  closed  right 
half  plane  such  that  (s/  -  fi  -  9C9)t  0  and  9|Z  >  0.  Since 

^>0, 

t^(5,ff,)t  « I'itD'iD  9^11g»«  =  0 

which  implies  that  3>c  »  0  and  Gi  >  0.  Hence, 


(s/-a-9:9)««(s/-tt)z 

Therefore, 


which  contradicts  the  assumption  that  (7,  Ct)  is  detectable,  a 
Proposition  1 .  Assume  that  R  >0  and  {Q^ ,  Aq)  is  a  de¬ 
tectable  pair.  Suppose  that  for  a  given  p  and  y  there  exists  a 
nonnegative  definite  solution  to  the  ARE  [Eq.  (9)].  Then 
the  control  law  given  as 

u^-R-^BlTlsX  (II) 

stabilizes  the  uncertain  dynamic  system  (!)  for  all «  arch  that 
IL,(()I,  <  y  and  IL»(0l«  <  7- 
Proof.  By  using  the  control  law  (1 1),  the  closed-loop  system 
is  described  as 

k~Apc  (12) 

where 

A,  -  Ao  +  DL,U)E  -  |Bo  +  \R  *  'Bjtl. 


The  ARE  (Eq.  (9)]  can  be  rewritten  at  following  the  Lyapunov 
equation; 

A7n,  +  H^.--Oi  (13) 
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Fig.  2  Block  diagraiii  of  dofcMoop 

where 


With  an  approach  umilar  to  that  taken  in  Sec.  Il.A,  a 
differential  game,  where  the  fictitious  disturbances  uy  and  v/ 
and  the  initial  conditions  play  against  the  control  u,  is  con¬ 
structed  such  that 

min  max  max  max  [-pf  (z(0) -ibin^fO)  -  ibi 

a  Wf  Of 


+  +  y/y/  -  +  v/v/) I  d/1 

subject  to  Eqs.  (14)  and  (15).  where  the  cost  for  the  initial 
conditions  is  included  to  handle  the  uncertainty  in  the  initial 
condition  from  the  nominal  value  of  £b>  As  T  —  oe,  a  time-in¬ 
variant  controller  is  obtained  in  Ref.  7  as 


Q,  =  -I- 

+  r^aisD  -  y-^E^Ll)(nfD  -  y-^E^Ll)^ 

+  y-^n,(y^F  +  BoR-'G^LlKy^F  +  BoR 
A.  =  /->-»£.(«)’•/..(€) 

At  =  p^C,^C,  +  G^/  -  T-Jf..(«)^Z,t(«)lC 

<y  implies  that  A.  >  0,  and  ILt(«)l,  <  y  and 
Claim  1  yield  At  >  0.  Hence.  Qi  is  nonnegative  definite.  Since 
(Q/ .  Ao)  is  detectable  by  assumption,  it  follows  from  Claim  2 
that  KE^AJE  +  Ao  +  DL^]  is  detectable.  From 

Lemma  4.1  (Ref.  9).  ((nAR''AsR**B/ll, +£rA,£ -p 
A,\  is  also  a  detectable  pair.  Applying  Lemma  4.1 
of  Ref.  9  again  yields  that  (Qi**,  A,)  is  detectable.  Applying 
Lemma  4.2  of  Ref.  9  to  the  Lyapunov  equation  (Bq.  (13)] 
completes  the  proof.  ■ 

Note  that  Proposition  1  holds  for  any  nonnegative  solution 
to  the  ARE  (Eq.  (9)].  However,  the  minimal  nonnegative 
solution  Hf  produces  the  smallest  gain  for  the  control  law. 

To  design  the  controller  (1 1).  the  design  parameters  p  and  y 
should  be  chosen  for  the  ARE  (Eq.  (9)]  to  have  a  nonnegative 
definite  solution.  In  particular,  as  the  value  of  p  increases,  sys¬ 
tem  performance  improves,  whereas  as  the  value  of  y  in¬ 
creases.  stability  robustness  with  respect  to  parameter  varia¬ 
tion  improves. 

B.  McaMKiBciit  Feedback 

In  this  subsection  the  state  is  assumed  to  be  partially  mea¬ 
sured  by  Eq.  (2)  and  the  control  u  is  restricted  to  be  a  measure¬ 
ment  f^back. 

By  use  of  the  uncertainty  modeling  of  Eq.  (3).  system  (1) 
and  (2)  include  an  IFL  description: 

i  =  AoT  +  +  f/Hy  (14) 


Jr,  »  A(JCr  +  £c<  (18a) 

u  =  C^c  (18b) 

where 

-  Ao-  -  MHly-^Hc  +  y^T^Tjn„ 

Cc  =  -R-'Bjn* 

if  there  exist  !!„  2  0  and  Pa,  >  0  satisfying  the  AREs: 

0 = A +  n*Ao  -  n*(BoR  -  'bI  -  y^r^rjyna.  +  o«  (W) 

0  =  AoP*  +  Pa.Al -  Pa,{Hly->Ho  -  y^Qm)Pm  +  TfTj  (20) 

such  that  Pm  -  yn*  >  0.  where 

Qm^^c^c^E^E->tZ^z,  yo^ryr 

and  y  is  assumed  to  be  positive  definite.  Since  both  AREs 
((19)  and  (20)]  may  have  more  than  one  nonnegative  definite 
or  positive  definite  solution,  many  controllers  can  be  con¬ 
structed  from  the  formulation  (18).  Note  that  M  >  0  and 
satisfies  the  ARE: 

(Ao  -t-  ^TjTjTlmW  +  MiAo  +  -^TfTjlimr 

-  M\HSy-'Ho  -  yni„fi„B-'Blnm\M  +  TjTj  =  0  (21) 
By  using  the  controller  (18).  the  closed-loop  system  becomes 


z  =  Hffic  -I-  y  V/ 


(15) 


Afi  = 


r  Ao 

iloCcl 

FLtGCcl 

L«r«0 

Ac  J’ 

0  i 

and  where  A  A^  is  the  variation  of  the  closed-loop  system  due 
to  the  uncertainty  in  system  (1)  and  (2). 

Proposition  2.  Assume  that  R  >  0,  V  >  0,  (Q*  .  Ao)  is 
detectable,  and  (Ao.  F/)  is  controllable.  If  there  exist  11^  ^  0. 
Pm  >  0  such  that  Pm'  -  T'*n*.  then  the  controller  (18)  sta¬ 
bilizes  the  closed-loop  system  (22)  for  all «  such  that 

OV  tL,m,<y.  ILt(t)l,<y.  Il.»(e)h<y  (23) 


where  P/  =  (f>  £]  and  uy  »  [lef 


Proof.  Equation  (21)  can  be  rewritten  as 
AfA^  +  AiM-  -G, 
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where 


AfAo-  MHlV- +  yr/F/H. 


e,  -  A#(//oV- '//o  +  +  F/r/ 


Since  Mo>  T/)  is  controllable  by  assumption,  it  follows  from 
Lemma  4.1  ^ef.  9)  that  (Au  Qt*)  >*  controllable.  Since  M 
>  0,  it  follows  from  Lemma  4.2  (Ref.  9)  that  A\  is  ruble.  It 
can  be  verified  that  3C,  defined  as 


[ 


-Af-' 


-M 

M 


satisfies  the  ARE: 

0 A  J3C  +  3C4d  +  XQsX  + 


(24) 


where 


Ox 


TfTj  0 
0  BcVBlj  ’ 


Rewriting  Eq.  (24)  as 


(Ad  +  AAei)^3C  +  3C  (^d  +  AAd)  *  -  Q4  (25) 

where  Q*  -  XQ2X  +  y^Qj  -  AAjX  -  XAA^.  Then  Q4  can 
be  written  in  the  following  form: 


•  'C -  A/- 'f] 

(■  Pm'f  ]’■ 

L iujbR 

L  Af-*D  JL  Af-'n  J 

r +  Hly-'Y]  r Z^Ll  +  HjV-*Yy 

'*’1  -HoV-'Y  Jl  -fUV-'Y  J 


+  7^ 


fi^C^C  +  E^A,E  +  Z^A*Z  0  1 

0  ilar-'a*r-'b«^ii„J 


where 


A,^J-y-^LlL„ 

Since  IL.(c)l,  <  7,  ILa(c)I,  <  7,  and  lL*(f)l,  <  7,  Q4  is 
nonnegative  definite.  From  Lxmma  4.2  (Ref.  9), 

((p*C^C  +  E^AJE  +  Z^A»Z^^  Ao  +  DL,E) 


is  detecuble  since  (Q^,  Ao)  is  detectable  and  A,,A/k  >  0.  There¬ 
fore,  there  exists  L  such  that  A2,  defined  as 

Aj  -  A*  +  DL,E  -  Lii^C^C  +  £»4iE  +  Z^A*Z)'* 
is  sUble.  Let 

[(p^C'^C  +  ErA^  +  Z^^Z)**  0  ] 

0  Ar/f-'E/luJ 


Since  /  -  7~^L/L«  >  0  and  R  >  0,  it  follows  from  Claim  1 
that  A»  >  0.  Observe  that,  for  this  L, 

=  f 

LA«fo’’F-'(A/o+ W.*Z)  A,\ 


Since  Ai  and  Ai  are  suble,  (Qs.  A^  +  AAd)  is  a  detecuble  pair 
which  implies  that,  by  Lemma  4.1  of  Ref.  9,  (Q4,  Ad  +  AAd) 
is  detecuble.  Since  3C  is  a  nonnegative  definite  matrix,  the 
proof  is  completed  by  applying  Lemma  4.2  of  Ref.  9  to  Eq. 
(25).  ■ 

Note  that  the  proposition  holds  for  all  controllers  con¬ 
structed  from  the  solution  of  AREs  and  is  therefore  very 
conservative. 

To  design  the  controller  (18),  the  design  parameters  p  and  7 
should  be  choosen  for  the  AREs  ((19)  and  (20)]  to  have  a  non¬ 
negative  definite  solution  and  a  positive  definite  solution, 
respectively.  In  particular,  as  the  value  of  p  increases,  system 
poformance  improves,  whereas  as  the  value  of  7  increases, 
subiiity  robustness  with  respect  to  parameter  variation  im¬ 
proves. 

In  the  usual  case  the  positivity  of  V  and  the  controllability 
of  (A ,  T/)  do  not  hold.  However,  these  can  be  avoided  by 
redefining  V  and  Tj  as 

K  =  r,rr+m,  r/  =  (rr>F] 

where  Pi  and  P  are  chosen  to  ensure  that  K  >  0  and  (A ,  T/)  is 
controllable.  It  can  be  proved  with  minor  change  that  Proposi¬ 
tion  2  holds  for  these  new  V  and  P/. 

III.  Two  Mass-Spring  System 

Consider  a  mass-spring  system,  shown  in  Fig.  1,  that  ap¬ 
proximates  the  dynamics  of  a  flexible  structure.^  The  system  is 
described  by 


Jfi  +  it(X|  -Xj)*  u 

(26a) 

X2  +  Ar(xj-Xi)»  w 

(26b) 

with  a  noncollocated  measurement 

«  =X2 

(27) 

where  k  is  an  unknown  constant  with  nominal  value  Ao  «  1,  u 
is  an  actuator  input,  and  w  is  a  cyclic  disturbance  described  by 


w(t)  m  Av  sin(0.5f  -t-  p) 
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where  and  w  ve  constant  but  unknown.  The  transfer 
function  form  of  the  system  and  measurement  equations.  Eqs. 
(26)  and  (27),  respectively,  is  given  as 


G(s) 


1 


z(5)  _ _ 

u(s)  s*(s*  +  2) 


The  design  objective  is  to  regulate  Xj  and  to  reject  the  ex¬ 
ternal  cyclic  disturbance  in  Xi  for  all  k  with  0.5  <  k  <  2. 

To  handle  the  cyclic  disturbance,  differentiate  Eq.  (26)  until 
w  disappears  in  the  resulting  system.  Differentiating  Eq.  (26) 
twice  yields 

jr}'*'*  -  -  *(x,  0.25X,  -  x,  -  0.25X,)  -  0.25;?,  +  fi  (28a) 

-  -  *(xj  +  0.25X2  -  -  0.25X,)  -  0.25x2  (Mb) 

where  the  parenthetical  superscripts  represent  the  time-deriva¬ 
tive  order  and  d  is  a  new  control  variable  defined  as 


u  =u  +  0.25u 


(29) 


The  variation  of  system  matrix  due  to  the  uncertainty  of  k  can 
be  decomposed  as  • 


AA 


[v][- 


n  0  0.25n  0  n  0 


0 

I 

0 

0 

0 

-1 


With  choices  of 

c  =  //.  r  -  0.07  •  (0  1  0  0  0  r,  -  0.033 
C,-0.08,  p=l,  7-0.043.  n  =  10 
the  control  u  can  be  obtained  by  using  Eq.  (18)  as 

Xf  =  AcXc  +  BcZ,  i  -  CcXc  (31) 


where 


f  0 

1 

8.41 

0 

0 

0 

-147.26  -17.11 

60.32 

-61.00 

44.21 

-174.52 

0 

0 

-7.19 

1 

0 

0 

0 

0 

-25.78 

0 

1 

0 

0 

0 

-51.65 

0 

0 

1 

1.03 

0 

-41.20 

0.11 

-1.11 

0.30 

1-8.41  - 

37.85  7.19 

-25.78  51.65  40.40)’^ 

\  Cc=(- 146.23  -17.11  22.24  -  60.89  43.34  -174.22) 


The  new  system  (28)  contains  uncontrollable  poles  at  5  - 
±  0.5j.  To  remove  the  uncontrollable  poles  from  Eqs.  (28).  a 
new  state,  is  introduced  as  (  -  x,  -t-  0.25x|.  Then  Eqs.  (28) 
are  represented  in  terms  of  Xi  and  {  as 

1=  -*{  + i?(x2 -I- 0.25x2)  +  fl  (30a) 

=-{k+  0.25)X2  -  0.254X2  +  (30b) 

A  controller  is  designed  for  this  augmented  system.  Figure  2 
shows  that  the  controller  for  original  system  is  constructed  by 
combining  the  controller  for  the  augmented  system  (30)  and 
the  relation  (29).  Define 

*  =  ({^X2X2X2Xr’l'" 


Note  that  n  is  a  weighting  between  the  £  direction  and  C,  and 
r  is  chosen  to  ensure  that  (A ,  F/)  is  controllable  (see  Proposi¬ 
tion  2).  The  design  parameters  p,  7.  and  n  were  chosen  to 
satisfy  the  robustness  requirement  that  0.5  <  4  <  2  and  the 
transient  requirement  that  the  system  settle  within  20  s.  The 
minimal  nonnegative  definite  solutions  for  the  AREs  ((19)  and 
(20)]  are  used  in  controller  design.  Combining  Eqs.  (29)  and 
(31)  yields  an  eighth-order  controller  for  the  original  system 
(28)  in  the  form  of 


Ac  Otx  j 


M=  (0,x*  1  0)4^ 


or  in  the  transfer  function  form: 


Then  Eqs.  (27)  and  (30)  can  be  represented  in  state-space  form 

as 


0 

1 

0 

0 

0 

o' 

o' 

-4 

0 

0.254 

0 

4 

0 

1 

0 

0 

0 

1 

0 

0 

X  + 

0 

0 

0 

0 

0 

1 

0 

0 

0 

0 

0 

0 

0 

1 

1 

k 

V.  ■  „ 

0 

-0.254 

0 

-4-0.25 

0 

0 

V' 

-V - - 

A  B 


z  -  (0  0  1  0  0  0]x 

>■  _ / 

H 


4  -4430(5  +  0.08KS  -  0.44XS  -  2.83) 

*  Z(5)  “  (5*  +  0.25K**  +  1 .78s  +  9.67) 

_ (5»- 0.15 ->•  0.24) _ 

^  (5*  +  6.565  +  13.Sl)(s»  +  15.68s  +  124.99) 

where  the  compensator  poles  are  at  ±0.5j,  -7.84±7.97>, 
-3.28  ^  1.667,  *"<1  -0.89  ±2.98y,  and  the  complex  compen¬ 
sator  zeros  are  at  0.05d:0.4S!/.  The  zero  configuration  of  the 
compensator  represents  nonminimum  compensation.  The 
closed-loop  poles  are  at  -2.14,  -0.33,  -0.13  *  0.56/, 
-0.19*0.25y.  -0.77  *  2.50/,  - 1.83 *  1. 457,  and  -7.84 

*  7.977. 
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Fic.4  Root  loci  of  domHoop  «yitw  with  reipcct  k. 


Ii»l  k-U.J 


Kite)  Miecj 

F^.  S  TImt  ictpoiue. 


Figure  3  shows  the  root  locus  of  I  -  ctGfsVtT Cr)  with  respect 
to  a.  For  the  controller  K (s),  the  gain  margin  is  3.23  dB  at  the 
frequency  0.67  rad/s,  and  the  phase  margin  is  25  deg  at  the 
frequency  0.20  rad/s.  The  values  of  y  and  n  show  that  the 
stability  margin  lAA:  I  <  0.43  is  guaranteed  for  this  compen¬ 
sator.  However,  the  root  locus,  shown  in  Fig.  4,  shows  that 
the  compensator  subilizes  the  system  over  0.5  <  A:  <  2.  Note 
that  the  compensator  pole  at  -  7.84  ±  7.97/  >8  unaffected  by 
parameter  or  open-loop  gain  changes.  The  removal  of  these 
poles  from  the  compensator  does  not  affect  stability  robust¬ 
ness  or  transient  response. 

In  the  simulation  the  measurement  is  assumed  corrupted  by 
zero  mean  white  Gaussian  noise  with  a  power  spectral  density 


of  (0.33).*  Time  responses,  shown  in  Fig.  5,  for  the  nominal 
system  and  the  perturbed  system  with  Ar  >  0.5  are  simulated. 
For  both  simulations,  -  0.5,  ^  «  0,  and  all  initial  condi¬ 
tions  are  zero.  Figure  5  shows  that,  for  the  nominal  case,  the 
controlled  variable  xt  has  settled  down  and  the  cyclic  distur¬ 
bance  is  rejected  in  xj  in  ~  20  s,  and  for  the  perturbed  system 
with  Ar  -  0.5,  the  settling  time  has  been  delayed. 

IV.  Conclusions 

A  game  theoretic  controller  was  applied  to  a  mass-spring 
system  disturbed  by  cyclic  external  force.  The  cyclic  distur¬ 
bance  was  augmented  to  the  system  by  a  procedure  involving 
differentiation  and  transformation.  The  resulting  state  and 
control  are  used  to  design  the  game  theoretic  compensator.  A 
nonminimum  phase  compensator  resulted. 
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an  dimination  method  such  as  the  cyclic  reduction  method.  This  line 
of  reasoning  should  provide  a  further  clue  to  improve  present  parallel 
and  sequential  algorithms  for  engineering  computation  by  exploiting 
specific  system  structures. 
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System  Cbaracterization  of  Positive  Real  Conditkms 
H.  Weiss,  Q.  Wang,  and  J.  L.  Speyer 


Ahstnct — Ncccmary  and  aulllrlftit  oooditioiis  for  positive  realncss  in 
terms  of  state-space  matrices  are  presented  under  the  assumption  of 
complete  controllability  and  oompietc  observability  of  square  systems 
with  independent  inputs.  By  a  particular  transform  of  these  cmiditions,  a 
direct  algorithm  for  testing  positive  rcalness  is  determined  that  requires 
only  chrritiiit  a  set  of  sfan^  algebraic  conditions.  This  provides  an 
altcmative  procedure  to  tire  positive  real  lenuna  and  to  the  s-domain 
inequalities.  Baaed  on  this  algorithm,  a  synthesis  of  a  positive  real  system 
via  output  fhedback  is  presented. 


1.  Introduction 

Positive  real  systems  play  a  major  role  in  control  theory,  especially 
in  adaptive  control,  and  in  stability  analysis.  The  impressive  develop¬ 
ment  of  adiqjtive  control  and  self-tuning  regulation  over  the  last  two 
decades  [1],  [2]  is  hinged  on  satisfaction  of  some  positive  realness 
conditions.  Alternatively,  considerable  imtial  knowledge  about  the 
controlled  plant  must  be  given.  The  prior  knowledge  is  used  to 
implement  reference  models,  idee  ^ers,  or  observer-based  controllers 
of  about  the  same  order  as  the  plane  Since  the  prior  assumptions 
about  the  controUed  plant  may  never  be  entirely  satisfied,  the  stability 
properties  of  the  related  adaptive  schemes  are  ddMtable.  Therefore,  a 
direct  adaptive  control  procedure  which  does  not  use  identifier  or 
observer-based  controllers  in  the  feedback  loop  is  prefeired.  The 
implementation  of  such  an  algorithm  requires  positive  teal  controlled 
plants  or  alternatively,  a  synthesis  of  a  positive  teal  plant  on  the 
basis  of  the  actual  plant. 

The  existing  tools  for  analysis  and  synthesis  of  positive  real 
systems  are  based  in  the  a-domain  on  complex  variable  inequalities 
which  are  inconvenient  or  in  the  state  H>acc  requiring  the  positive 
teal  lemma  equations.  These  tools  are  computationally  complex  and 
there  is  a  need  for  an  easily  used  complementary  tool.  Necessary 
and  sufficient  conditions  for  positive  teal  systems  equivalent  to  the 
positive  real  lemma  are  developed  in  Sections  D  and  m  using  optimal 
control  theory  for  the  associated  partially  singular  problem.  Positive 
real  systems  are  characterized  in  terms  of  the  necessary  and  sufficient 
conditions  of  optimal  control  theory  such  as  the  generalized  Le- 
gendre-Cldisch  condition  [3],  [4].  By  using  a  special  transformation, 
the  resulting  test  for  positive  realness  reduces  to  testing  certain 
square  matrices  for  positive  definiteness  related  to  the  generalized 
Legendre-Clebsch  condition  and  the  solution  to  an  algebraic  Riccati 
equation  of  possibly  reduced  dimension.  This  test  for  positive  realness 
parallels  that  given  in  [S].  The  transformation  used  here  based  on 
the  results  of  [6]  produce  interesting  characterizations  of  positive 
teal  system  in  terms  of  system  zeros  and  the  necessary  conditions  in 
singular  optimal  control.  That  a  positive  real  system  is  of  minimum 
phase  becomes  transparent  in  this  development  In  Section  IV,  it  is 
proved  that  if  a  square  system  is  minimum  phase  with  certain  positive 
teal  characteristics,  there  exists  a  constant  output  feedback  gain  such 
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that  the  louiiug  flond-loop  tystem  is  positive  real.  Some  examples 
are  shown  in  Seciioa  V  to  Uustraie  the  Ifaeoty.  Concluding  remaiks 
are  given  in  Section  VL 

The  derivation  of  this  new  test  for  positive  realness  is  based  upon  a 
state-space  fonnulation  of  dissipative  systems.  Basic  definitions  and 
physical  chaiicieristics  are  presented  below. 


Bemarkl.4:  Since  this  test  is  done  for  all  fiequencies,  the  amouffi 
of  for  matrix  systems  may  be  iaige.  The  approach  here 

only  requires  a  finite  number  of  operations  which  guarantee  positive 
leatneas. 

n.  REijaiONS  Beiwsn  Ofiimal  Control  AND  Positive  Realness 


A.  Dissipative  System 

Consider  the  system  iqmt-ouQNit  descr^xion  E-.U-*Y  where 
U  =  Lh  (JL).)  and  Y  =  (Jtf ).  The  notation  (^)  used  to 

denote  the  space  of  square  integrable  fimctioos  where 

R+  =  (to.  oo).  The  n^y  rate  associated  with  this  system  is  defined 
as  the  fiinctian  w:  Jt*  x  A*"  -»  A  where 


«>(««.  y)  =  yQy  +  2y'Su  +  uRu  (l.i) 


and  Q  €  A”**"*,  5  €  A™*'.  A  €  A***  are  constant  matrices,  with 
Q  and  A  symmetric. 

Definition  IJ 17}:  A  tfynamical  system  H  is  dissipative  with 
respea  to  die  supply  rate  w(x,  y)  if  and  only  if 


wI^CO.  y(*)l*  >  0 


(1.2) 


for  all  ti  >  to  and  all  «  €  l4.  whenever  the  initial  state  satisfies 
x(to)  =  0.  The  concqx  of  a  supply  rate  is  related  in  the  general  case 
to  the  “stored  energy”  for  the  system. 

Remark  LI:  Passivity  corresponds  to  dissipativeness  where  Q  = 
A  =  0, 1  =  m,  5  s  (l/2)/m  and  /m  is  m  x  m  identity  matrix. 

Remark  1.2:  Positive  realness  cotreqxxids  to  passivity  where  the 
dynamical  system  is  linear  and  time  invariant 
Assume  that  the  system  under  considetation  is  linear  and  time- 
invariant  giviqg 

i  =  Ax-trBu  (1.3) 

y  =  C*  +  Du  (1.4) 


where  X  €  A", «  €  A',  y  €  A*"  and  A,  B,  C,  and  D  are  constant 
matrices  with  ^ipropriate  dimensions. 


B.  Review  aftiie  Positive  Real  Property 


The  positive  teal  property  is  related  direcdy  to  the  transfer  function 
matrix  description  of  the  system.  The  positive  real  lenuna,  presented 
in  Section  n,  connects  the  positive  realness  to  the  parameters  of 
a  system  realization  with  complete  controllability  and  complete 
observability. 

The  Positive  Real  Property  [8,  p.  51]:  Let  G(t)  be  an  m  x  m 
matrix  of  functions  of  a  conqilex  variable  a.  Then  (7(a)  is  termed 
positive  real  if  the  following  conditions  are  satisfied: 

i)  All  the  elements  of  (7(a)  are  analytic  in  Re[a]  >  0. 

ii)  G(a)  is  real  for  teal  positive  a. 

iii)  G*(a)  +  G(a)  >  0  for  Re[a]  >  Owhere  (•)*  denotes  conqilex 
c^jugate  transpose. 

Remark  1.3:  If  G(a)  is  a  real  rational  matrix  of  fhnctions  of  a, 
then  necessary  and  sufficient  conditions  for  the  positive  real  property 
to  hold  are  given  by  the  following  theorem. 

Theorem  1.1  [8,  p.  53]:  Let  G{e)  be  a  teal  rational  matrix  of 
functions  of  a.  Then,  G(a)  is  positive  real  if  and  only  if 

i)  No  element  of  G(a)  has  a  pole  in  Re[a]  >  0. 

ii)  G*(j(j)  -f  G(j>e)  >  0  for  all  real  u,  with  ju  not  a  pole  of 
any  elmnent  of  G(t). 

iii)  If  ju)o  is  a  pde  of  any  element  of  G(a).  it  is  at  most  a  simple 
pole,  and  the  leriihM  matrix. 


riim«-.jw(a-yu;o)G(a)  if  is  finite 

llim«-.^w  G{ju>)/jiA>  if  jwo  is  infinite 


is  nofuiegative  definite  Hetinitian. 


A.  The  Related  Variationtti  Problem 


r'li—iAr  minimizing  the  cost  functional 


V[xo,  to,  «(•))=  f  w[«(f),  y(01<lf  (2.1) 

Jio 


where  the  supply  rate  is 


w(u,  y)  =  y'u  =  i(«'A«  -f-  2x'(7' u) 


(2.2) 


adject  to  the  dynamic  system  (1.3)  and  (1.4),  where  R  =  D  +  D'. 
The  tfimensioo  of  u  and  y  is  assumed  to  be  m.  Denote  u*(.)  €  V 
as  the  control  which  minimizes  (2. 1 )  subject  to  the  dynamic  equation 
(U)  and  (1.4),  where  x(to)  =  xo  is  prescribed.  The  necessary  and 
sufficient  conditions  for  I  "|xo,  to]  =  V'lxo,  to,  u*(.)]  to  be  bounded 
firom  below  are  equivaloit  to  die  necessary  and  sufficient  conditions 
for  V|0,  to,  «(■)]  to  be  nonnegative  definite  (positive  real). 

Remark  2.1:  B  A  >  0,  and  rank(A)  =  r  <  m,  there  exists  an 
orthogonal  transformatioo  matrix  F  =  (Fi,  Fjj  such  that 


(2.3) 


where  Ar  is  positive  [9],  Positive  realness  of  G(a)  s  f'G(s)F  is 
not  afiected  by  this  transformation. 


B.  Paeitive  Real  lemma  Equations 

Necessary  and  sufficieot  conditions  for  V*[xo,  to]  to  be  bounded 
fiom  below  over  a  finite  turre  interval  (to,  ti]  are  presented  in  [10, 
theorem  IL33].  The  required  positive  reri  conditions  are  obtained  via 
Oe  exSension  of  Theorem  n.3.3  to  the  time-invariant,  infinite-time 
inletvai  case  [11]. 

Under  the  complete  controllability  and  complete  observability 
assiuttptioo  of  system  (1.3)  and  (1.4),  necessary  and  sufficient  con¬ 
ditions  for  tire  normegativity  of  V[0,  to.  «(■)]  are  that  there  exist 
X  <  0,  L,  and  W  such  that 

[’AVc  = 

where  W  and  L  m  of  proper  diniensions. 

By  identifying  P  =  -n.  the  positive  real  lemma  is  stated  below. 

The  Positive  Real  Lemma  18  p.  218]:  Let  G(s)  be  an  m  x  m 
matrix  of  teal  rational  functions  of  a  conqilex  variable  a,  with 
G(oc)  <  oo.  Let  {A,  A,  C,  D}  be  a  minimal  realization  of  G(a). 
Tlieo,  G(a)  is  positive  real  if  and  only  if  there  exist  real  matrices  P, 


L,  and  W  with  P  symmetric  and  positive  definite,  such  that 

PA  -I-  A'P  =  ~L'L  (2.5) 

B'P  =  C-  W’L  (2.6) 

W'W  =  D-iD'.  (2.7) 

Remark  2.2:  The  generalized  Legendre-Clebscb  condition  which 
is  a  necessary  condition  for  V*(xo,  to]  >  -oo  in  the  totally  singular 
case,  for  linear  time-invariant  system  as  given  in  [3],  is 

4-(Eu)  =  CB-  {CB)'  =  0  (2.8) 

ou 

4-{Bu)  =  CAB  +  {CAB)'  <  0  (2.9) 

OU 
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where  B  u  tile  viriaiioiial  Hamiluxiiaii  and  A  £  Jl"  is  the  associated 
Lapange  multiplier  defined  by 


B'=  uCx  +  A'(v4a  +  jBu), 


A'  =  -E,. 


By  letting  D  =  0.  (2.8)  and  (2.9)  are  also  obtained  from  the  positive 
kmma.  (2.8)  can  be  obtained  from  (2.6)  which  is  in  this  case 
B'P  =  C.  (2.9)  can  be  obtained  by  pre-  and  post-  multiplying  (2J) 
by  B'  and  B.  respectively,  then  applying  B'P  ^  C. 

in.  Positive  Real  CoNomoNS  IN  TfeRMS  OF  Staie-Spacb  Matrices 
Let  {A,  B,  C,  B}  be  a  minimal  realization  of  G(»)  =  r'G(»)r 
[see  (2!3)].  In  terms  of  state-qwce  matrices,  (2.4)  gives  necessary 
and  sufficient  conditions  for  a  positive  real  system.  In  this  section, 
necessary  and  sufficient  conditions  for  positive  real  systems  are 
developed  using  singular  optimal  control  [10]. 

Tb  siniplify  the  notation,  we  assume  that  B,  C,  and  D  admit  the 
ftrilowing  partition 

C=[c;]’  P=B-H)'=[^  J] 

where  Br  is  an  nxr  matrix,  B.  is  an  nzs  matrix,  Cr  is  an  rzn 
matrix,  C«  is  an  axn  matrix,  B  is  an  mxm  matrix,  and  Br  is  an 
rxr  matrix,  where  r  =  rank(B)  is  the  dimension  of  the  nonsingular 
coDtitfi,  and  a  =  m  —  r  is  the  dimension  of  the  singular  control. 

B  being  positive  semidefinite  is  a  necessary  condition  for  (2.4)  to 
be  satisfied.  If  B  >  0,  (2.4)  can  be  reduced  to  a  condition  based  upon 
a  Rkcati  equation.  That  is,  there  exists  a  symmetric  negative  definite 
solution  ir  to  the  algebraic  Riccati  equation 

x(A  -  BB~‘C)  +  (A-  BR-^C)'x 

-*rBB-*B'ir-C'B~‘C  =  0.  (3.1) 

If  B  is  singular,  (2.4)  can  be  rewritten  with  some  matrix  V  as 


irj4  +  A'x  xBr  Cr  B',x  -I-  C, 
B'rX  Cr  Br  0 

Bfif  C»  0  0 


=  V'V 


or,  equivalently,  there  exist  a  rr  <  0  and  some  matrix  Vr  such  that 


A.  Derivation  efTtan^ormud  Necessary  and  Si4ficient  Conditions 
For  (partially)  singular  problem,  (3.2)  and  (3.3)  serve  as  necessary 
and  sufficient  ooaditions  for  positive  lealness  of  a  system.  Since  (3.2) 
and  (3.4)  partially  determine  the  structure  of  r,  the  original  problem 
can  be  transfonned  into  the  positive  realness  of  a  reduced-order 
systeiiL 

By  that  (3.4)  holds,  one  can  perform  the  following 

transformation  with 

T  =  r-*  =  (Af.  B,(C.B.)-‘l 

where  N  and  A/  are  (n  —  s)xt  and  sx(n  —  a)  matrices  consist  of 
basis  of  the  null  spaces  of  B.  and  C«,  respectively,  such  that 

NB.  -  0.  C.M  =  0,  NM  =  In— 

The  new  realization  of  G{t)  becomes  {At,  Bt,  Ct,  D),  where 

4  _  tst"*  -  NAB.[C,B,r^  1 

at -1  At  -  ^c,AM  C,AB,iC,B,)-'  \ 

Br  =  TB  =  \TB„TBA^^ 

-rT-i-{CrT-n_\CTi  Cn 

By  applying  (3.2)  and  (3.3)  with  respect  to  {At,  Bt,  Ct,  B},  we 
obtain 

’[c,B.]+[^]  =  [S] 

and 

r  R^T+A'ra  »rBr-KCrr-‘)'l 

[(rBr)'tr-i-ar-»  Br  J-°- 

To  satisfying  (3Ji),  the  structure  of  tr  can  only  be 

’"“[o  -(C.B.)-*]- 

In  order  that  a  <  0,  the  n  -  a  dimensiooal  ai  must  be  negative 
definite. 

By  substituting  (3.7)  into  (3.6),  we  obtain 


xBm  -I"  =  0 

(3.2) 

\xiAi'^-A\iri  xiBi-fCI 

|_rBixi-fCi  Bi 

>0 

(3.8) 

and 

fxA-i-A'x  xBr  +  Cr]  y/jr 
[BU-t-Cr  Br 

where 

(3.3) 

Ai  =  NAM 

(3.9) 

Under  conditions  (3.2)  and  (3.3),  two  cases  are  treated  sqiarately. 

Bi  =  IJVAB.(C.B,)"‘,  BBrl 

(3.10) 

Case  f)  If  the  dimension  of  the  state  is  less  dian  or  equal  to  the 
dimension  of  die  singular  control,  i.e.,  n  <  a,  x  can  be  determined 

^  r-(C.B.)-*C,>tAf 

-  [  CrM 

(3.11) 

from  (3.2).  If  and  only  if  a  solution  a  <  0  can  be  solved  from  (3.2) 
and  die  same  a  satisfies  (3.3),  the  system  is  positive  teal. 

Cose  2)  If  n  >  a,  (3.2)  and  a  being  symmetric  and  negative 
definite  itiqily  that 

C.B,  =  (C.B.)'  =  -B'.xB,  >  0.  (3.4) 

Note  that  (3.4)  is  necessary,  but  not  sufficient  See  (2.8)  for  a 
variational  interpretation.  The  new  necessary  and  sufficient  conditions 
are  derived  for  Case  2)  in  the  next  subsection. 


and  (3.12)  as  shown  at  the  bottom  of  the  page.  Aocarding  to  (2.4)  or 
die  positive  real  Lemma,  the  proof  of  positive  realness  of  the  original 
system  is  reduced  to  die  proof  of  positive  realness  of  the  reduced- 
ordo'  system  {Ai,  Bi,  Ci,  Bi/2}.  The  process  will  be  continued 
until  either  Bi  becomes  nonsingular,  or,  the  dimensioo  of  i4i  is  less 
than  or  equal  to  the  dimensicHi  of  null  spsce  of  Bi.  Fuidiermore, 
from  (3.8),  the  upper  left-hand  block  of  (3.12)  must  be  nonnegative 
definite.  Note  diat  this  satisfies  die  generalized  Legendie-Clebsch 
condition  given  in  (2.9). 


fl.  =  [ 


-{C,B,)-^[C.AB.  -»■  {C,AB,)%C.B.y 

(CrBs  -  b;c;)(c,b.)-* 


(C.B.)-‘(BlCr 

Br 


-C,Br) 
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Xgmark  3.1:  Ike  calctiUtioo  of  N  and  M  is  well  defined.  Com¬ 
pute  the  singular  value  deconqmsition  of  fi.  and  C.  as 

B.  =  (ir„.  °l[v|;] 

wfiere  Vi,  U3.  [Vu,  Uta],  and  j  orthogonal  matrices, 

and  and  £c  are  nonsingular.  Then,  we  can  define  N  and  M  as 

N  =  U{a,  M  = 

A.  Necessary  and  Si^ietit  Conditions  for  Positive  Realness 
The  new  necessary  and  sufficient  conditions  for  positive  realness 
is  summarued  in  the  following  theorem. 

Theorem  3.1:  The  transfer  function  G(e)  =  {X,  B,  C,  D]  is 
positive  real  if  and  only  if 

i)  B  >  0 

ii)  If  B  >  0.  there  exists  a  positive  definite  solution  P  to  the 
following  algebraic  Riccati  equation 

P(A  -  BB"'C)  +  (>4  -  BR-^O'P-irPBR-^B'P 

-(-C'B-*C  =  0. 

iii)  If  rank  B  =  r  <  m,  and  n  <  «,  there  exists  P  =  CB(BB)  > 
0  satisfying  PB  =  C\  and 


Remark  3.3:  From  (3.9)  and  Remark  3.2,  we  conclude  that  there 
are  n  -  m  finite  zeros  for  a  positive  real  strictly  proper  system  and 
all  the  zeros  lie  in  the  closed  left-half  complex  plaae.  In  other  words, 
the  system  is  minimnm  phase. 

To  sinqilify  the  approach  further,  a  transformation  can  al¬ 
ways  be  found  such  that  if  n  >  m,  a  minimal  realization  of 
G{»),{A,  B,  C,  D)  can  be  established  so  that  C  =  (0, /]. 
In  particular,  this  is  accomplished  by  finding  a  realization  in 
observability  canonical  form  [11].  In  this  case,  M  and  ,V  can 
sifflply  be  chosen  as 

"=[V]’  =  -b.ib:,'] 

wbere  B«  =  [B^i,  B',2]  and  B«2  is  assured  invertible. 

IV.  Synthesis  of  Posotve  Real  System  Via  Output  Feedback 
Consider  the  linear  system 

x  =  Ax-i-Bu  (4.1) 

y  =  Cx.  (4.2) 

It  can  be  shown  that  if  CB  =  (CB)'  >  0,  and  the  system  is  strictly 
minimum  phase,  then  an  output  feedback  gain  K  can  be  found  such 
that  with  tt  =  ui  -f  Ky,  the  closed-loop  system 


-PA  -  A'P 
-B'rP-i-Cr 


-PBr-fC;l 

Br 


>  0. 


X  (A  —  BK)x  -1-  Bui  (4.3) 

y  =  Cx  (4.4) 


iv)  If  rank  Bs=r<mands<n,  then  C,B,  =  (C,B,)'  >  0 
and  {Ai,  Bi,  Ci,  Bi/2}  is  positive  real  where  .<4i.  Bi,  Ci,  ^  Bi 
are  defined  in  (3.9H3.12). 

Condition  ii)  is  obtained  by  identifying  P  with  ->r  in  (3.1). 
Condition  iii)  is  the  intetpretatiaa  of  (3.2)  and  (3.3)  for  the  case 
n<a.IfP=-sr>0  exists,  then  PB,  =  C't  and  PB,B',  =  C’,B„ 
and  P  =  C.B,(B.B^)~‘  >  0.  Condition  iv)  corresponds  to  the 
situation  discussed  in  Section  m-A. 

Remark  3.2:  If  G{s)  is  strictly  proper,  and  CB  is  nonsingular, 
then  the  eigenvalues  of  Ai  =  XAM  are  the  transmission  zeros  of 
system  (1.3)  and  (1.4).  Since  the  system’s  zero  s  are  determined  from 


(3.13) 


pre-  and  post-multiply 
Uia  0 

Uii  0  and 
0  I 

tfvely,  die  ftfilowing  matrix  is  obtained 


-{[-c"  ?]}  =  « 

j  by  nonsingular  matrices 
■?] 


i4-S/  B] 
C  0 


’Vi,{U[aVnY 
0 


0 


respec- 


'N(A-eI)M  •••  0  ■ 

...  ...  SsVy 

0  Va'ZciViaVaa)-^  0 


Since  £bVi  and  Va^ciUiaVaa)~^  are  nonsingular,  (3.13)  is  equiv¬ 
alent  to 


det(N(A  -  eI)M)  =  det{NAM  -  si)  =  0. 

That  is,  the  eigenvalues  of  NAM  are  the  same  as  the  system’s  finite 
zeros. 


is  positive  real  with  respect  to  ui  now  as  the  input.  By  applying  the 
new  positive  teal  conditions  to  a  system  {A  -  BKC.  B,  C,  0),  the 
selection  of  K  becomes  straight  forward. 

Let  N  and  M  be  such  that  NB  =  0  and  CM  =  0.  Then  the 
reduced-order  system  {Ai,  Bi,  Ci,  Bi/2}  is  the  one  with 

/li  =  N{A  -  BKC)M  =  NAM 

Bi  =  N{A  -  BKC)B(CB)~^  =  NAB(CB)~' 

Cl  =  -{CB)-^C\C(A  -  BKC)M  =  -(CB)-*CAM 
Bi  =  -(CB)-‘((7(A  -  BKC)B  -1-  B'(A  -  BKC)’C'\{CB)-' 
=  -(CB)-*[C’AB 4-  B'A'C'KCB)-'  A'  -I-  A". 

Provided  that  CB  =  (CB)'  >  0,  the  positive  realness  of 
{Ai,  Bi,  Cl,  Bi/2}  implies  the  positive  realness  of  the  closed- 
loop  system.  By  applying  the  positive  real  Lemma,  it  is  shown  that 
for  some  K,  there  exists  a  Pi  >  0  such  that 


PiAi -fA'iPi  = -A'A  (4J) 

PiBi  -  c;  =  -L'W  (4.6) 

-  (c,Bi)-‘(c,A,B,  -i-b;a;c;i(c,b,)-‘ 

■kK  +  K'  =  W'W.  (4.7) 

Since  the  oiigiiia]  system  is  strictly  minimum  (diase,  by  dmosing  L 
such  diat  (Ai,  L)  is  conqiletely  observable,  (4.1)  admits  a  solution 
Pi  >  0.  In  order  that  W  is  solvable  from  (4.6),  L  can  be  chosen  as 
a  nonsingular  square  matrix  which  also  guarantees  that  (A|,  L)  be 
observable.  Finally,  K  can  be  solved  from  (4.7).  The  above  discussion 
is  summarized  into  the  following  theorem. 

Theorem  4.1:  If  the  system  is  minimum  phase,  and  (CB)  = 
(CB)'  >  0,  there  exist  a  constam  feedback  matrix  A'  such  that 
the  closed-loop  ^stem  with  u  =  «i  —  Ky  is  positive  real. 
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V.  Examples 


Theorem  3.1  introduces  a  procedure  for  testing  positive  real 
systems,  requires  only  the  testing  a  series  of  matrices  >  0.  for 

I  =  0, 1,  2,  -  -  ■ ,  it,  sod  the  solution  to  an  aJgebnk  Riccad  equation 
Pi  >  0.  where  t  is  the  index  associated  with  the  new  system  obtained 
from  the  itfa  iteration,  and  i  =  0  conespoods  to  B.,  C,.  and  P.  The 
testing  stops  when  R,  becomes  nonsingular,  or  dim(Ai)  =  0. 

The  following  examples  illustrate  the  application  of  Theorem  3.1. 

»-n  I 

Examples.!:  G(8)  =  •*+_*!+*  •’H?*  I- 

L#t+27+5'  jS+Jj+a  J 

A  minimal  realization  of  G{s)  is  {A,  B,  C,  D]  where 


Thus.  C.  =  (2,  1],  B,  =  pj.  In  the  test.  B  >  0.  C,B.  =  1  >  0. 

By  choosing  M  =  |_2|*  ^  ~  -Ai  =  -3,  Bi  = 

[1,  1],  Cl  =  |^_i  j’  ^  ~  [  1  2]'  “  nondcfinite 

because  the  Legendre-Oebsch  condition  fails,  the  system  is  not 
positive  real. 

Example  5.2:  G{s)  =  ((s  +  2)(«  +  3)/s(«  +  l)(a  +  4))  = 
(«^  +  5s  +  6/s^  -f  5s^  +  4s).  An  observable  realization  of  G{») 
is  given  below 


0  0  o' 

'6' 

1  0  -4 

5 

0  1-5 

1 

C  =  [0,  0. 1],  B  =  0. 


In  the  test,  B  =  0,  CjB,  =  CB  =  (CB)'  =  1  >  0.  By  choosing 


M  = 


1  O' 
0  1 
0  0 


we  obtain 


Cl  =  10,  -1], 


B,  =0. 


Now  test  positive  realness  of  {Xi,  B|,  C|,  0}.  Since  CiBi  = 
— 2  <  0,  the  system  is  not  positive  real. 

Example  5.3:  In  this  example,  we  are  going  to  constnict  an  output 
feedback  gain  K  for  the  system  given  in  Example  5,2  such  that  the 
closed-loop  system  is  positive  real.  From  (4.5),  by  choosing  £  to  be 

the  identity  matrix,  we  obtain  ^  0  s] '  “*’**>*“*“>8 

P  and  L  into  (4.6),  W  is  solved  as  IV  = 

W  into  (4.7),  K  =  6.38  is  obtained. 


1 

-2.6 


.  By  substituting 


VI.  Summary  and  Cimclusions 
This  paper  reviews  positive  real  systems  as  a  subclass  of 
dissipative  systems  and  states  the  positive  real  lemma  equations.  By 
using  the  variational  problem  associated  with  the  paifrally  siiigular 
problem,  necessaty  uid  sufficient  conditions  for  a  system  to  be 
positive  real  are  stated  which  are  equivalent  to  dm  positive  real 
lemma.  The  Legendre-CIebsch  conditions  and  the  zero  structure 
are  particularly  transparent  through  the  transformation  discussed  in 


Section  Dl-B.  These  positive  realness  conditions  ate  expressed  in 
terms  of  the  state-space  matrix  inequalities  and  an  algebraic  Riccali 
equation  of  possibly  reduced  dimension.  These  conditions  do  not 
d^  with  inequalities  tested  over  the  frequency  domain  or  witii 
searching  for  m«tTir*«  that  satisfy  the  positive  real  lemma  equations. 
Essentially,  the  direct  test  developed  here  provide  a  medMdology 
for  using  the  positive  teal  inmma  A  system  either  satisfies  these 
conditions  or  does  not  These  conditions  also  made  the  synthesis 
of  positive  real  system  straight  forward.  Examples  are  given  which 
demonstrate  the  power  of  this  approach. 
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Abstract — IMng  sample  path  oomparlsoiis,  we  study  the  sisrhsitir 
motmtonietty  and  concaviiy  properties  of  the  asovliig  and  Jaaphtg  win¬ 
dow  flow  control  mertianlnais,  leaky  bucket  flow  control  mechaaimn,  and 
the  token  bank  rate  control  throttle  to  determine  the  effect  caatrol  pa¬ 
rameters  have  on  thronghpots  and  downstream  congestioa  levds.  Remits 
are  developed  without  distrlbatiotial  aasmnptiont  nn  the  packet  arrival 
streams  and  make  use  of  spedally-taDond  closed  qnendng  networks. 
Performance  comparisons  between  are  presented. 
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Risk-sensitive  Estimation  and  A  Differential  Game 

Ravi  N.  Banavar  *  and  Jaf?r>ti  L.  Speyer  ^ 

Abstract 

A  large  deviation  result  is  employed  to  solve  the  state  estimation  problem  of  a  continuous 
time  Gauss-Markov  system  with  an  exponential  cost.  The  exponential  cost  is  the  exi>ected 
value  of  an  exponential  function  of  the  state  estimation-error.  A  scalar  0  api>earing  in  the 
cost,  termed  as  the  risk  factor,  determines  the  penalty  on  the  higher  order  moments  of  the 
error.  In  contrast  to  the  minimum  variance  estimate,  penalty  on  large  deviations  of  the 
estimation  error  is  ptossible. 

1  Introduction 

The  expected  value  of  an  exponetial  function  as  a  performance  measure  -  henceforth 
referred  to  as  exponential  cost  (EC)  -  was  initiadly  proposed  by  Jacobson  [11]. 
The  cost  is  emplc^ed  as  an  optimality  criterion  for  a  GatLss^Markov  system  with 
perfect  knowledge  of  the  systems  states.  The  EC  includes  as  a  special  case  the 
LQG  cost.  The  performance  measure  was  later  examined,  for  the  case  of  imperfect 
state  observation  by  [17,  20,  2, 12).  In  [21]  Whittle  views  the  cost  as  a  risk-sensitive 
criterion  and  lends  a  desirable  certainty  equivalence  principle  to  the  problem.  An 
application  to  missile  guidance  is  demonstrated  in  [18]. 

Interest  in  the  exponential  cost  problem  was  revived  when  the  solutions  obtained 
*Aast.Profes9or,  Systems  and  Control  Engineering,  Indian  Institute  of  Technology,  Bombay  -  400 
076,  INDIA. 

^Professor,  Mechanical,  Aerosnace  and  Nuclear  Engineering  Department,  UCLA,  Los  Angeles, 


CA  90024 


from  it  were  linked  to  those  of  the  Hoo  problem  [8,  4].  In  his  initial  paper,  Jacob¬ 
son  [11]  demonstrates  the  link  between  the  EC  and  a  differential  game.  Solutions 
to  the  Hoo  problem  for  time-varying  systems  is  obtainable  through  a  differential 
game  approach  [1,  15,  13).  The  differential  game  commonality  links  the  stochas¬ 
tic  problem  with  the  EC  to  the  deterministic  problem  with  a  worst-case  measure 
{Hoo)-  Most  of  the  existing  results  on  the  exponential  cost  problem  assume  an  ex¬ 
act  equivalence  to  optimizing  a  quadratic  performance  measure.  More  recently,  the 
problem  has  been  re-examined  from  a  large  deviation  perspective.  Whittle  [24,  25). 
states  a  risk-sensitive  maximum  principle  derived  from  a  result  in  large  deviation 
theory.  Non-linear  extensions  to  the  same  are  found  in  [6].  Recasting  the  problem 
into  a  large  deviation  framework  highlights  the  sub-optimal  nature  of  the  solution 
obtained  through  a  quadratic  kernel  optimization.  This  is  also  in  consonance  with 
the  deterministic  Hoo  theory  where  sub-optimal  solutions  are  obtained  [4]. 

This  paper  examines  the  continuous  time  version  of  the  estimation  problem  on 
a  finite  time  interval.  Speyer  et  al.  [19]  have  considered  the  discrete  time  version  of 
the  problem.  As  stated  in  [19],  the  estimation  problem  in  this  stochastic  setting  is 
particularly  interesting  since  the  cost  function  is  a  curious  exception  to  the  family  of 
functions  proposed  by  Sherman  [16,  9]  that  yield  a  conditional  mean  as  the  optimal 
estimator .  Here,  the  performance  measure  is  recast  into  a  large  deviation  framework 
and  a  sub-optimal  solution  is  obtained  by  invoking  a  result  in  large  deviation.  The 
approach  is  similar  to  Whittle’s  [25].  The  two  main  theorems  are  stated  and  proved 
in  section  2. 


2  Preliminaries 

Consider  a  Markov  process 


X  =  Ax  -H  Bw  X  G  iT*,  w  €  iT* 


(1) 


with  a  measurement 


y  =  Cx  +  v  y£  i?",  t;  €  i?*  (2) 

where  /(x(f)|X(t— ))  =  /(x(t)|x(t— ))  is  the  Markov  assumption  and  A,B,C  are 
time-varying  matrices  of  appropriate  dimension.  The  system  noise  w  and  the  mea¬ 
surement  noise  v  are  assumed  to  be  white  with  zero  mean  and  covariances  W  and 
V.  The  cost  function  is  defined  as 

i=5jrii*-*f  dr  (3) 

where  ||  x  —  x  |p  =(x  —  x)^(x  —  x)  and  the  expectation  is  over  the  random  variables 
w,  V  and  the  initial  state  x(0)  and  induced  by  the  estimate  x.  The  cost  function  is 
minimized  with  respect  to  the  estimate  x(f) 

^  minj  (4) 

where  x(t)  is  restricted  to  a  causal  function  of  the  information  W{t)  defined  as 


where 


where 


Yt={y{r)  :0<T<t} 


.Yt={x(T)  :  0  <  T  <  t} 


and  the  distribution  of  the  state  at  the  initial  time  is  Gaussian  given  by 


/(x(0))  = 


(27r)?(det(Po))°® 


where  xq  is  the  mean  (a  priori  estimate  at  time  t  =  0)  and  Po  >  0  is  the  covariance 
matrix. 


The  EC  penalizes  large  deviations  of  the  estimation  error.  A  better  picture  of 
the  cost  function  is  obtained  by  expanding  the  exponential  function  as 


J=E[\-eL-\- 


+  ...] 


(5) 


2!  3! 

As  6  gets  large  (>>  1),  the  higher  order  moments  E[L\E[L% . . .  aie  weighted 
more  heavily  than  E[L],jE[L^].  As  6  gets  small  (<<  1),  the  lower  order  moments 
are  weighted  heavily  and  in  particular,  as  d  — »  0  the  second  moment  is  the  dominant 
factor  in  the  cost  and  the  problem  reduces  to  a  minimum  variance  measure.  The 
risk-factor  0  allows  a  certain  degree  of  freedom  in  shaping  the  probability  density 
function  of  the  estimation  error. 

3  Main  Results 

The  problem  is  cast  into  a  large-deviation  [5]  framework  and  the  solution  is  obtained 
by  invoking  a  large  deviation  result. 

Preliminaries:  Consider  the  course  of  the  Markov  process  with  state  variables 
(x,  z)  where 

X  =  Ax  +  Bw 


and 


z^  =  Cx  -h  w 


over  the  time  interval  [0,7]. 

The  derivative  characteristic  function  H  and  the  action  functional  Dffr  [7]  corre¬ 
sponding  to  this  Markov  process  are  defined  as 


H{x,  a, 0)=a^{Ax)  +  ^{Cx)  -h  ^[Q^)Va  -t- 
where  a  and  ^  are  conjugate  variables  (Lagrange  Multipliers)  and 
Dffi{x{.),z{.))=  sup  f\a^x  + 0^z- H{x,a,^]dT 


(6) 


(7) 


where 


^(•)=W'r):0<T<T} 
x(.)={x(t)  :  0  <  t  <  T} 

When  the  initial  state  x(0)  is  unobserved  and  supposed  random  (here  Gaussian) 
then 

■Dor(x(.),  z{.))  =  Do(x(0))  +  sup  f[a^x  +  ifz  -  H{x,  o,  /9)]dr  (8) 

<»(•).«•)  •'0 

where 

/(x(0))  ocexp-®o(*{°» 

is  the  probability  distribution  of  the  initial  state. 

With  these  preliminaries  in  view,  consider  the  cost 

It  can  be  rewritten  as 

where  A;  is  a  positive  scalar  parameter.  Note  that  this  modification  alters  the  perfor¬ 
mance  index  but  does  not  affect  the  optimal  estimate.  Let  A  large  deviation 
cesult  [5,  7]  applied  to  the  present  problem  states  that 

^elog(i?[e"‘®^])  =  -^s^{Dot{x(.),z(.))  +  0L} 

where  "ess”  is  over  the  appropriate  measure.  Rom  the  above  result,  for  large  k  the 
performance  index 

is  logarithmically  asymptotic 

exp{-Asessinf{£>or(a:(-).^(-))  + 

*(•).»(•) 


or 


E[e  ~z,  exp{-A:ess^{I>or(i(.),  ^(0)  +  (9) 

*().»(•) 

where  denotes  logarithmic  asyznptoticity. 

With  this  backgroimd,  the  main  theorem  of  the  paper  is  stated.  The  theorem 
below  is  akin  to  the  risk  sensitive  maximum  principle  in  [25]  applied  to  the  case 
of  control  with  imperfect  observation.  Here  it  is  derived  for  the  problem  of  state 
.estimation. 

Theorem  1  'Hk  sub-optimal  estimate  x(t)  is  obtained  by  seeking  the  infimmn  of 
the  linear-quadratic  performance  index 

J,  =  iDo(x(0))  +  1  II  X  -  X  f 

+  (y  -  Cx)^V~'(y  -  Cx)]dT  (10) 

with  respect  to  the  estimates  {®(t)  :  0  <  t  <  J},  and  the  supremum  (infimum)  with 
respect  to  {x(0),tu(T)  :  0  <  t  <  T}  {y(T)  :  0  <  t  <  7}  )  for  ^  <  0  (  ^  >  0).  The 
performance  index  is  subject  to  (1). 

Proof: 

FVom  (9) 

exp{-A;  inf 
Let 

J=exp{-k  iirf  {Por(x(.).  ■«(•)) +  (H) 

Minimizing  J  with  respect  to  x(.)  is  equivalent  to  minimizing  J  with  respect  to  x(.) 
We  shall  consider  the  case  9  <  0  only.  The  proof  is  similar  for  9  >  0. 

For  9  <  0 

=  i]^exp{-9A:  sup  4-))  +  ^ 

*(•)  *(•)  x(.)^.)  9 


(12) 


(13) 


=  exp{-^^i^  sup  i^Dffrixi.), <?(.))  +  L} 

=  exp{-ekin£  sup  {^Do7<x(.),if(  ))  + -t} 

Now  examine  the  term 

sup  /  (o^i  +  0^z  —  H(x,  a,  ^)]dT 
o()A)-'° 

in  the  action  fvmctional  (8).  By  a  straightforward  completion  of  squares  it  is  shown 
to  be 

I  +  ^V~^v]dT 

JO  2 

which  in  tmn  can  be  expressed  as 

J \\vFvr^w  +  (y  —  Ca;)^V^^(y  —  Cx)]dT 
Jo  2 

and  substituting  in  (8)  results  in 

D«K<  ),  ^(•))  =  Do(x(0))  +  ^ +  {y-  Cxfv-'iy  -  Ci)1<It  (14) 


Define 


(15) 


From  (13)  and  (14)  the  sub-optimal  estimate  is  obtained  by  solving  the  differential 
game 


QED 

□. 


inf  sup 


{iDoK*().^())  +  i} 


(16) 


Remark  I:Note  that  in  the  case  9  <  0,  the  linear-quadratic  problem  is  a  differential 
game  and  for  ^  >  0  the  problem  reduces  to  a  one-sided  optimization. 

Remark  2:  The  estimate  is  termed  svh-optimal  since  the  the  differential  game 
formulation  is  equivalent  to  the  original  problem  only  in  the  limiting  case  as  c  — »  0 


Theorem  2  A  saddle  point  solution  to  the  differential  game  Jg  exists  if  and  only 
if  there  exists  a  solution  to  the  Riccati  differential  equation 

P  =  +AP+  BWB’^  -  P{<f}r^C  +  BI)P,  P(0)  =  Po  (17) 

over  the  time  interval  [0,7].  The  optimed  values  are  then  given  by 

a:(0)’  =  io 
w*  =  WB^P^^ix  -  x) 
y*  =  Cx 

i  =  M  +  PCF\r\y-Cx)\  i(0)=xo  (18) 


Proof: 

Assuming  ®(^)i  ^)  is  the  optimal  return  function  [3]  at  time  t.  (All  decisions 

are  assumed  optimal  over  the  interval  [f,7)  and  x{t)  is  the  a  priori  estimate  at  time 
t.)  The  saddle  solution  is  sought  by  solving  the  Hamilton- Jacobi-  Bellman  partial 
differential  equation  [3]  or  Issacs’  equation  [10] 

-^=i^  sup  [^x-l-M(x(t),x(f),«;(f),y(t),t)]  (19) 

where 

M(x(f),  x(t),w(t),  y{t),  t)= 

i  II  i(t)  -  i(4)  f  +  (y(«)  -  -  Ci(t)))  (20) 


Assume  a  solution 


Substituting  (21)  in  (19)  and  subsequent  algebraic  manipulation  results  in 

x(0)*  =  xo 
tv*  =  WB'^P-\x  -  x) 
y'  =  Cx 

i  =  Ax  +  PCF\r\y  -  Cx);  x(0)  =  xo  (23) 

where 

P=  PA^  +  AP+  BWB^- P{CfV^^C+  BT)P  P{0)  =  Po  (24) 

With  the  optimal  values  given  by  (23) 

J,(x*,y,w,xiO))  <  J,(x*,y*,w*,x(0y)  <  J,(x,y‘,ti;*,x(0)*) 

Vx(0)  6  IT,  X,  X,  y  e  2/2[0, 1).  (25) 

This  proves  sufficiency. 

(Necessity) : Assume  that  P(t)  becomes  unbounded  at  time  te>  0  <te<T.  P{t)  -*  oo 
as  t  — ♦  te  from  below.  Let  e  bea  small  number.  The  variation 

AJg=Jg{x*,y,w,x{0))  -  Jg{x*,y*,w*,x{0)) 

''SS'S  ®  -  u.  Ili-. 

+5  II  II’  +5<ll  “  lli»  +  II  »  -  Cx  !{.,)<«  (26) 

where  Ae*=x  —  x*  represents  the  error  in  the  optimal  estimate.  The  existence  of 
P(f)  is  not  assumed  for  the  interval  [t*— e,T}  and  consequently  the  optimal  estimator 
given  by  (18)  does  not  exist  in  this  interval.  In  the  interval  [t«  -  e,I],  x*  represents 


the  optimal  strategy  of  z,  no  longer  governed  by  the  dynamics  (18).  Note  that  the 
optimal  estimator  over  this  interval  of  time  may  be  non-linear  as  well. 

Now  consider  the  following  strategies  for  y  and  w, 


Then 


to  =  WB^p-*Ac*,  y  =  Cx*  VtG[0,te-e] 

<"  =  0,  y=Cx  Vteft, 

A  J,  =  II  Ae-(i.  -  e)  |g..,^.„  +5  ||  Ae'  f  dt) 


(27) 


(28) 


As  c  — »  0,  P~^(te  —  «)  tends  to  a  singular  matrix.  For  the  interval,  (0,  te  —  e],  Ac*  is 
governed  by 


Ac*  =  AAc* -b  B(WB^p-‘Ae*) 

=  (A-hSWB^P-^)Ac* 


(29) 


The  solution  to  the  linear  dynamic  equation  (29)  can  be  expressed  in  terms  of  a 
state-transition  matrix, 


Ae*(te  -  e)  =  $((te  -  e),  0)Ae*(0)  (30) 

where  $(, ., )  is  the  state-transition  matrix  of  (A  +  BWB^P~^)  and  Ae*(0)  =  z(0)  — 
zq.  Since  z(0)  is  arbitrary,  z(0)  can  be  chosen  such  that  lim«_^)  Ae*(te  —  e)  lies  in 
the  null  space  of  lim^  .3  P~^{te  —  e).  Note  that  the  latteris  a  singular  matrix.  Then 

Um(i  II  Ae-(4.  -  €)  ||?.,m_.,=  0  (31) 

and 

AJ,  =  Umij[_  |Ae’||'dt>0  (32) 

Therefore,  (32)  is  a  contradiction  thatz*  is  optimal.  Hence,  P(t)  must  exist  Vt  € 

(0,1). 

QED 


Remark  5:If  the  solution  to  the  Riccati  differential  equation  (17)  exists  over  the 
time  interval  [0, 7]  then  the  solution  remains  positive  drefinite  throughout  the  in¬ 
terval  [15,  26]. 

Remark  4-  For  ff  <  0,  the  estimator  views  the  initial  state  x(0),  the  disturbance 
w  and  measuremnt  corruption  v  as  adversaries  and  adopts  the  safest  (risk-averse) 
strategy. 

For  0  >  0  the  estimator  views  the  actions  of  the  initial  state  x(0),  the  disturbance 
w  and  measuremnt  corruption  v  as  friendly  and  adopts  a  more  adventurous  (risk- 
prone)  strategy. 

Remark  5:  Define  an  attenuation  function 

7  A  _ _ ^i|ar-g  _ 

II  *(0)  -  h  115^1  +  JS’  II  t"  11^1  +  II  V  ll^i 

where  ((sc(0)  —  xo),  u>,  v)  €  (JR",  £2(0, 7),  £2(0, 7))  and  sup  stands  for  supremum.  The 

sytem  dynamics  (1)  and  measurement  (2)  are  now  deterministic  and  the  objective 

is  to  find  an  optimal  estimator  which  bounds  i.  e. 

-  1 
-e 

wghere  ^  is  a  negative  scalar.  An  optimal  estimator  for  the  performance  measure 
(33)  is  identical  to  (18)  and  for  the  time  invariant  case  (with  a  few  additional 
assumptions)  is  the  77^  optimal  solution  [14]. 


4  Conclusions 


The  large  deviation  approach  applied  to  the  exponential  cost  problem  highlights 
the  sub-optimal  nature  of  the  solution  obtained  through  a  quadratic  kernel  opti¬ 
mization.  This  feature  is  very  much  similar  to  that  encoimtered  in  the  deterministic 
coimterpart  or  the  Hoo  optimal  problem.  Even  in  the  latter  case,  sub-optimal  so¬ 
lutions  are  obtained.  The  estimator  is  not  the  typical  conditional  mean  (Kalman) 


filter  and  is  an  exception  to  the  large  class  of  cost  functions  proposed  by  Sherman. 
Existence  of  the  optimal  solution  depends  on  the  existence  of  the  solution  to  a 
Riccati  differential  equation,  which  in  turn  depends  on  the  risk-factor  0. 
Acknowledgements  The  first  author  appreciates  the  comments  of  the  anonymous 
reviewer  regarding  theorem  1. 
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CENTRALIZED  AND  DECENTRALIZED  SOLUTIONS  OF  THE 
LINEAR-EXPONENTIAL-GAUSSIAN  PROBLEM  * 

Chih-hai  Fan,^  Jason  L.  Sp>eyer,^  and  Christian  R.  Jaensdi^ 


ABSTRACT 

A  particular  class  of  stochastic  control  problems  constrained  to  different  information  patterns  is  con¬ 
sidered.  This  class  consists  of  minimizing  the  expectation  of  an  exponential  cost  criterion  with  quadratic 
argument  subject  to  a  discrete-time  Gauss-Markov  dynamic  system,  i.e.,  the  Linear-Ebcponential-Gaussian 
(LEG)  control  problem.  Besides  the  one-step  delayed  information  pattern  previously  considered,  the  classical 
(includes  current  observations)  and  the  one-step  delayed  information-sharing  (OSDIS)  patterns  are  assumed. 
After  determining  the  centralized  controller  based  upon  the  classical  information  pattern,  the  optimal  decen¬ 
tralized  controller  based  upon  the  OSDIS  pattern  and  the  solution  to  a  static  team  problem  is  found  to  be 
affine.  A  unifying  approach  to  determine  controllers  based  upon  these  three  information  patterns  is  obtained 
by  noting  that  the  value  of  a  quadratic  exponent  of  an  exponential  function  is  independent  of  the  information 
structure.  Even  though  the  controllers  are  determined  by  a  badcward  recursion  of  this  exponent,  the  value 
of  the  cost  criterion  is  not;  rather  a  coefficient  of  the  exponential  delineates  the  value  of  the  cost  criterion 
with  respect  to  the  information  patterns.  Both  necessary  and  sufficient  conditions  for  the  controllers  to  be 
minimizing  are  obtained  regardless  of  the  exponential  form.  The  negative  exponential  form  is  included  which 
is  unimodal  but  not  convex. 

1.  INTRODUCTION 

Control  problems  with  an  exponential  cost,  in  particular  the  exponential  of  a  quadratic  form,  have  proved 
to  be  an  interesting  extension  to  control  problems  with  only  a  quadratic  cost,  such  as  the  well  known  linear- 
quadratic-gaussian  (LQG)  control  problem.  The  first  to  consider  the  linear-exponential-gaussian  (LEC 
problem  was  Jacobson  [8]  who  treated  the  case  of  i>erfect  state  observation.  Jacobson  pointed  out  the 
relation  between  the  LEG  problem  and  deterministic  differential  games.  Subsequently,  Speyer,  Deyst,  and 
Jacobson  in  [18]  derived  results  for  special  cases  of  the  general  LEG  problem  with  imperfect  observations 
where  they  obtained  fixed  finite-dimensional  controllers.  Speyer  et.  al.  also  considered  the  genera]  LEG 

*Thia  work  was  aunported  by  the  Air  Force  Office  of  Scientific  Reeearch  under  Grunt  AFOSR  91-0077 
tRaaeaxch  Awistunt,  Department  of  Mechanical,  Aeroepace  and  Nuclear  Engineering,  UCLA. 
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^German  Aeroepace  Reeearch  Organieation(DLR),  Weeeling-Ober&fienhofen,  PRG 


problem  with  imperfect  observations,  but  obtain  an  optimal  controller  which  is  a  function  of  the  entire 
smoothed  history  of  the  state  vector  from  the  initial  to  the  current  time.  Kumar  and  Van  Schupi>en  [13] 
obtained  results  analogous  to  [18].  In  [21]  Whittle  obtained  the  solution  to  the  general  LEG  problem  with  a 
one-step  delayed  information  pattern  and  showed  that  a  fixed-dimensional  controller  resulted.  In  his  analysis, 
a  risk-sensitive  certainty  equivalence  principle  is  derived,  from  which  the  differential  game  interpretation 
follows.  Later,  Bensoussan  and  Van  Sdiuppen  obtained  in  [2]  results  in  continuous  time  analogous  to  [21]. 
A  major  objective  of  this  paper  is  to  consider  in  a  unified  manner  LEG  problems  having  various  information 
patterns.  Included  is  the  decentralized  LEG  problem  which  is  more  naturally  formulated  in  a  discrete-time 
setting.  Therefore,  we  wiU  not  refer  to  the  continuous  time  LEG  optimal  control  problem  any  further. 

In  contrast  to  centralized  control  which  assumes  all  the  measurements  of  the  system’s  state  are  available 
(the  cUusical  information  pattern),  decentralized  control  focuses  on  systems  in  which  not  all  the  information 
is  available.  A  particular  information  structure  is  assumed  here  called  the  one-step  delayed  information- 
sharing  pattern  (OSDISP)  which  assumes  that  each  control  station  (elements  of  the  control  vector)  has,  at 
the  current  time,  all  the  previously  implemented  control  values,  all  the  observations  made  anywhere  in  the 
system  through  and  including  the  previous  time,  and  its  own  observation  at  the  current  time.  Hence,  the 
difference  between  the  one-step  delayed  information-sharing  pattern  and  the  classical  information  pattern  is 
that  the  current  observations  are  not  shared.  The  one-step  delayed  information  pattern  is  that  only  the  past 
observations  are  available  and  the  current  observations  are  not  available  to  any  control  station. 

Dynamic  team  problems  associated  with  a  one-step  delayed  information-sharing  pattern  can  be  conve¬ 
niently  reduced  into  a  series  of  static  team  problem  by  utilizing  the  Dynamic  Programming  method.  However, 
this  is  in  general  not  true  with  arbitrary  information  pattern  [24].  Nevertheless,  for  different  cost  functions 
the  one-step  delayed  information-sharing  pattern  has  produced  recursive  solutions  for  obtaining  the  optimal 
control  law.  In  particular,  Sandell  and  Athans  developed  in  [17]  the  first  recursive  solution  to  the  team 
problem  with  a  quadratic  cost,  one-step  delayed  information-sharing  pattern  and  linear  dynamics.  Fbr  the 
exponential  of  a  quadratic  cost  function  without  intermediate  state  penalties,  Krainak,  Machell,  Marcus  and 
Speyer  derived  in  [12]  an  analogous  result  for  a  one-step  delayed  information-sharing  pattern.  In  section  6 
we  extend  that  resxilt  for  the  general  exponential  of  a  quadratic  cost  function. 

In  the  following  presentation  superscripts  on  vectors  indicate  either  components  or  partition  of  vectors. 
The  following  defines  a  class  of  static  team  problems. 

Definition  1.1 

A  team  decision  problem  is  oonoemed  with  a  decision  making  unit  (called  a  team)  consisting  of  M 
members,  that  diooses  a  value  of  the  decision  vector  u  from  a  subset  Ec  RP  and  incurs  a  cost  C(upc),  which 
depends  upon  both  the  decision  u  and  the  prevailing  state  of  the  world  x.  The  decision  variable  u  is  usually 
a  p- tuple  of  individual  decision  variables,  or  an  M- tuple  of  individual  decision  vectors  u*  €  H*  C  RP\  where 
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pi=p  —  dimu.  The  set  of  all  possible  dedaon  values  is  some  subset  of  the  Cartesian  product  of  the 
S*’  s,  i.e.  S  C  H*  X  H*  X  •  •  •  X  5*^.  We  also  assume  that  i  €  ft,  where  (Q,  E,P)  is  a  given  probability  space 
and  f2  =  iT*  for  some  n,  E  consists  of  the  Borel  sets  on  /T*  and  P  is  a  known  probability  measure.  C(u,x) 
indicates  the  penalty  associated  with  each  decision  for  eadi  state  of  the  world  and  is  assumed  for  ail  purposes 
of  this  paper  to  be  real- valued  and  Borel  measurable  on  {RP  x  fl).  Furthermore,  the  t*'*  team  member  has 
available  to  him  some  observation  of  the  state  of  the  world  :  2*  =  hf{x),  where  dim  z*  =  r*  and  z  is  the 
vector  of  all  the  observation  values  known  to  the  team  members  with  dim  z  =  r.  Note  that  z*  indicates  the 
t‘*  partition  of  z  and  not  the  t‘*  component.  It  is  also  presumed  that  /i*(-) :  — ►  /T*  is  a  Borel  measurable 

function  to  avoid  the  discussion  of  pathological  cases.  Finally,  it  is  supposed  that  the  set  of  admissible  control 
laws  for  the  t**  team  member,  f/‘,  consists  of  all  Borel  measurable  functions,  7*(-):  RP*  H‘,  where  H*  is  a 
Borel  measurable  subset  of  RP*.  This  means  that  the  control  value  u*  is  solely  a  function  of  the  observation 
z*.  The  set  of  all  admissible  team  control  laws  is  defined  as  Ut  =  x  x  ■  ■  •  x  .  The  problem  faced 
by  the  team  is  to  select  the  control  law 


which  iTunimizes  the  average  cost,  J  =  E[C(7(z),i)J.  □ 

For  the  remaining  part  of  this  section  and  throughout  the  paper,  we  adhere  to  the  notation  given  in  the 
above  definition,  unless  noted  otherwise. 

A  suflident  condition  for  global  optimality  of  a  class  of  static  team  problems  is  given  by  Radner  (15).  In 
particular,  it  is  assumed  that  for  every  fixed  i  €  /i,  C(u,  x)  is  convex.  Furthermore,  because  of  the  hard  to 
verify  “local  finiteness”  condition  in  (15),  an  extension  which  drciunvents  this  conditions  is  given  by  Krainak 
et.  al.  in  (11).  Further  simplifications  are  obtained  here  because  of  the  restriction  to  exponentials  with 
quadratic  arguments. 

In  section  2  and  3  we  extend  the  results  presented  by  Whittle  in  (21)  for  the  slightly  more  natural 
assumption  that  the  control  at  the  current  time  is  a  function  of  the  observation  history  up  to  the  current 
time  (the  dassical  information  pattern)  rather  than  only  up  to  the  previous  time  (the  one-step  delayed 
information  pattern).  In  addition,  in  section  2  and  3  we  contribute  detailed  proofs  to  some  of  the  Theorems 
of  (21)  where  only  a  proof  outline  is  given.  The  lemma  given  by  Whittle  in  (21)  converts  the  expectation 
operation  of  the  exponential  of  the  quadratic  function  to  the  extremization  operation.  Using  this  property, 
we  avoid  the  tedious  proof  in  (11)  for  the  exponential  cost  and  give  suffident  conditions  for  both  the  risk- 
averse  and  risk-preferring  cases  of  the  LEG  team  problem  in  Section  4.  The  optimal  team  control  function 
for  risk-preferring  is  found  also  as  an  affine  function.  For  the  risk-preferring  case,  C(u,  i)  is  an  exponential 
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faction  but  nonoonvex.  This  extends  the  results  of  [11]  in  whidi  only  the  risk-averse  case  with  a  convex  cost 
function  is  considered  to  be  a  nonconvex  but  unimodal  exponential  function.  For  more  general  functions  of 
C(u,x),  the  work  in  [11]  must  still  be  extended.  In  Section  5,  we  investigate  the  relationship  among  three 
different  information  patterns:  the  classical  information,  one-step  delayed  information,  and  one-step  delayed 
information-sharing  patterns.  Using  the  results  of  the  classical  infonnation  and  one-step  delayed  information 
patterns,  the  dynamic  programming  recursion  will  be  shown  to  be  essentially  the  same  for  the  three  different 
information  patterns.  This  property  simplihes  the  derivation  of  the  control  gains  for  the  one-step  delayed 
information-sharing  pattern  in  that  the  backward  algorithm  in  [10]  is  no  longer  needed.  Furthermore,  only 
the  coefficient  of  the  exponential  in  the  cost  function  changes  for  different  information  patterns  indicating 
their  relative  optimality.  In  section  6  we  derive  the  optimal  decentralized  controllers  for  the  one-step  delayed 
information-sharing  pattern.  This  extends  the  results  of  Krainak,  Machell,  Marcus  and  Speyer  in  [12],  who 
examined  only  the  case  in  which  terminal  state  penalty  is  present.  Also,  the  results  in  Section  6  are  extended 
to  nonconvex  C{u,  x)  but  unimodal  exponential  functions. 

2.  THE  LINEAR-EXPONENTIAL-GAUSSIAN  PROBLEM 


Consider  the  following  linear,  stochastic,  discrete-time  system. 


Xt+i  =  AtXt  -I-  BtUt  -l-  tvt  (2.1) 

where  we  assume  that  dim(if)  =  n  and  dim(ut)  =p.  The  state  observation  is  now  restricted  to  the  form 


Zt  =  HtXt  +  Vt 


(2.2) 


where  zt  €  fC.  Notice  that  the  definition  in  (2.2)  is  different  from  that  in  [21].  In  addition,  xq  is  normally 
distributed  with  mean  Xq  and  covariance  Vq  >  0,  and  {«;t}  and  {vf}  are  assumed  to  be  zero-mean,  jointly 
Gaussian,  independent  random  variables  for  all  t  =  with  known  positive-definite  covariance 

matrices  Wt  and  0t,  respectively.  Whereas  this  description  of  the  dynamics  suffices  for  the  centralized  LEG 
problem,  a  few  additional  requirements  have  to  be  added  for  the  decentralized  LEG  problem. 

Specifically,  it  is  assumed  that  ut  and  Bt  are  partitioned  as 


Ut  =  [(W)^.  •  •  • . (ur)T.  Bt  =  \BlBl  •  •  • ,  B^\ 


(2.3) 


where  u{  corresponds  to  the  control  implemented  at  time  t  by  the  team  member  (t  €  {1, •  •  •, M})  with 
dim(ui)  =  Pi  and  P«  =  P-  addition,  zt  is  partitioned  as 

=  ((^/  )^.  •  •  • .  (^"  ff  (2.4) 

where  z\  corresponds  to  the  observation  of  the  ***  team  member  at  time  t  with  dim(2{)  =  Vi  and  J^ili 
It  is  also  presumed  that  6(  can  be  partitioned  as 


et  =  I?ta<7[e‘,  •••,©"] 


(2.5) 
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In  other  words,  vj  and  {i,j  €  ^  j)  are  assumed  to  be  independent  Gaussian  random 

variables,  where  vj  corresponds  to  the  noise  corrupting  the  observation  of  the  team  member. 

The  objective  is  to  minimize  the  following  performance  index, 

J{e)  =  (2.6) 

where  0  is  a  real  scalar  and  ^  is  given  by, 

N-l 

=  51  (^r  Qt^t  +  ^***)  +  xJ,Qnxn  (2.7) 

(sO 

We  assume  that  Qt>0  and  Rt>0,  and  Rt  is  partitioned  in  the  decentralized  problem  as 

Rt=  .  (2.8) 

In  addition,  all  matrices  and  vectors  are  assumed  to  be  compatibly  dimensioned. 

The  feature  distinguishing  the  decentralized  problem  &x>m  the  centralized  is  the  assumed  information 
pattern.  In  particular,  a  one-step  delayed  information-sharing  pattern  is  assumed  for  the  decentralized  LEG 
problem  which  can  be  described  as  follows, 

Yq  =  li(empty  set),  Yt  s  Vi-i  U  V?  =  Kt  U  {«|} 

Given  this,  we  require  for  the  decentralized  LEG  control  problem  that  the  team  member  constructs  his  own 
IR  control  value,  uj,  according  to  the  mutually  agreed  law,  7j(-) :  V,’  i.e.  u\  =  ‘yliY^)  e  U*,  as  discussed 

in  Definition  1.1  where  Ut  is  the  admissible  control  set. 

On  the  other  hand,  we  require  for  the  LEG  problem  with  the  classical  information  pattern  that  the  control 
at  time  t  be  a  Borel  measurable  function  of  the  observation  history  Yt,  yt(-)  '  Y  BP  i.e.  u*  =  -ftiYt), 
where  Yq  is  the  initial  information  available  on  xq  and  =  T,’  U  •  •  •  U  The  set  of  all  admissible  control 
functions  with  the  classical  information  pattern  is 

Uc  =  {«t :  =  7t(yt)} 

The  information  pattern  based  on  Y  is  called  classical,  the  information  based  on  Y^  is  one-step  delayed 
information-sharing,  and  the  information  pattern  based  on  Yt  is  called  one-step  delayed.  Similar  to  the 
definition  of  Uc,  the  set  of  all  admissible  control  functions  for  the  one-step  delayed  information  pattern  is 

f/s  =  {ut :  V,  =  7t(n)} 

From  the  above  definition,  the  following  relation  is  obtained. 

UsCUtC  Uc  (2.9) 
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In  general,  we  will  also  adopt  the  notation, 

Xt  =  (lo.ii.  -.aft).  X^  =  (x4,zt+i,  •,XAf) 

with  a  similar  convention  for  all  other  variables. 

2.1  Some  preliminaries 

Two  lemmas  from  Whittle  [21]  [22]  are  given  here  to  form  the  basis  for  the  dynamic  prograiruning  method- 
ology  given  in  section  2.2.  The  first  lenuna  is  concerned  with  the  relationship  between  nunimization  and 
int^ration  of  gaussian  densities  and  is  restated  below  somewhat  more  precisely. 

Lemma  2.1  Let  S(u,v;d)  be  a  quadratic  form  in  the  components  of  the  vectors  u  and  v  with  dim(v)  =  r. 
In  other  words,  let  =  [ti^,i»^],  then 

S(u,t;:^)  =  i^’'5^  +  fc^  +  n  S  = 

If  0  >  0,  suppose  that  ^  >  0  and  5(u,  v;  0)  attains  its  minimum  at  u  =  tt*  and  v  =  v*,  whereas  if  0  <  0, 
assume  that  Suu  >  0,  Sw  <  0  and  5(u,  v;  0)  attains  its  minimax  solution  at  u  and  v  =v*.  Then 

min  n  oc  (2.10) 

“  J—oo 

and  the  minimum  is  attained  at  u  =  u*. 

Proof.  See  [9]  [5]  or  [23] 

The  importance  of  this  lemma  is  that  it  precisely  relates  the  expectation  operation  on  the  exponential 
with  quadratic  argument  with  respect  to  a  Gaussian  probability  density  to  the  extremization  with  respect 
to  the  random  variable  of  the  quadratic  argument  of  the  exponential. 

The  second  preliminary  result  concems  the  explicit  form  of  the  conditional  probability  density  of  all  un¬ 
observables  and  is  stated  in  detail  below.  Let  /{XnjZnIUn-i)  denote  the  joint  probability  density  function 
of  the  random  variables  X^,  Zn  given  the  admissible  control  sequence  U^-i,  where 

Xn  =  ■  ,xn),  Zn  =  {zo,zi,---,zn),  l/w_i  = 

This  notation  is  consistent  with  that  of  Whittle;  see  [23]  equation  (2.39). 

Lemma  2.2: 

f[XN,ZN\UN-i)  =  n£Lil/(**lifc)/(i*l=c*-i.Ufe_,)]/(zo|xo)/(xo)  oc  (2.11) 

with 

N-\  M 

P  =  («*  +  +  ^*0  ~  5Eo)^''o"‘(a:o  -  So) 

k=0  t=l 


•Suu  Suv 
•Suu 
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N-1 

=  IZ  - £o) 

AM) 

n*  =  (i*+i  -  AkXk  -  BfcU*)^W'“‘(ifc+i  -  AkXk  -  BkUk) 

»>»fc  =  (4  -  ^**fc)^(0*)“‘(4  -  ^fc*fc).  *  €  {1.  •  •  -  .M},  rrik  =  (*fc  -  HkXk)'^(Qk)~^{zk  -  HkXk) 
and  the  constant  term  for  (2.11)  is 

const,  =  (27r)-^|Vorini'^*((27r)-»lW*|-i)niLoI(2^)~*ie*ri)  (2.12) 

Proof.  See  [9]  or  [5]. 

Define 

S^  =  ilf  +  e-^(P  +  mN)  (2.13) 

S®  is  called  “stress"  in  (21)  and  is  defined  in  (2.7).  We  shall  also  say  that  a  variable  extremizes  S‘  if  it 
minimizes  S‘  in  the  case  0  >  0,  and  it  maximizes  S‘  when  0  <  0. 

2.2  The  Dynamic  Programming  decomposition 

The  Dynamic  Programming  recursion  for  the  classical  information  pattern  is  given  as  follows.  The  cost 
function  is 

J  —  min  E\—0e~^^*] 

Vs-i 

By  applying  the  Fundamental  Theorem  of  Dynamic  Programming,  see  [18],  J  is  written  as  follows, 

J  =  E[min  £;[min  E\-  •  ■  min £;[•  •  ■  min  ^[E[-0e-4®*|yA,]|yA,]  •  •  •  |y,+i]  •  •  •  lyjjiyijj 

Wo  Wj  Wj  —  I 

The  above  equation  can  be  rewritten  more  explicitly  as  follows, 

/OO  I'OO  poo  pOO  pOO  poo 

min  /  min  /  -••min  /  (•••min  /  (/  -0e~^^* 

OO  “®  J—OO  J —OO  **  J—OO  J —OO  J —OO 

f{XN\YN)dXN\f{zN\YN)dzfi  ■  ■  ■  f{zt+\\Yt+i)]dzt+\  ■  •  ■  /(22|y2)dr2 

f{.zi\Yi)dz,f{zo)dzo=  f  mm[f  min[ •••rnmlf  minify  —0e~^^* 

J—OO  J—OO  “•  J—OO  “•  J—OO 

f(Xff\YN)f(Z(^2\^N-iy^t+i)dXNdZ^2]fi^t+\\(Jt)dzt+i]---dzi]dzi]dzo  (2-14) 

From  (2.14)  the  optimal  return  function  is  defined  as, 

Jt+i{Y,+i)  =  nun  E\-0e-i^*\Yi+i]f{Zt+i\Ut) 

*^♦+1 
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Then,  the  Dynamic  Programming  recursion  rule  is 


J,(yi)  =  min  f 
J-co 


The  optimal  return  function  at  the  final  stage,  i.e.  at  t=N,  after  reviewing  (2.14)  is 


(2.15) 


MYn)  =  (-mn’\UN-i)E{e-io*\YN] 

=  (-«)  r  e'-io*f{XN\YN)f{ZN\UN-i)dXN 

J  —  oo 

=  i-0)  r  t~^^*J{XN,ZN\UN.,)dXN 
J—oo 


FYom  Lemma  2.2 


JsiYf/)  =  const\{—6)  1^  e  ^^^dXs 

J—OO 


(2.16) 


where  consti  is  given  in  (2.12).  Let  ^  equivalent  to  in  Lemma  2.1  and  apply  Lemma  2.1  to 

(2.16).  Let  ^s(Yn)  =  extx^5®  where  “ext"  means  “min”  for  d  >  0  and  “max"  for  0  <  0,  then  JN(Yn)  can 
be  written  as 


Jn(Yn)  =  amst^  =  consti(2ir)  *  ^  ^ 

where  ^n{Yn)  is  a  quadratic  fiuiction  in  Zn  and  Un-i. 

The  optimal  return  function  at  time  stage  N-1  is  evaluated  by  using  the  Dynamic  Programming  recursion 
rule  (2.15).  Thus, 

Jn-i{Yn-\)— mm  f  Jfi{Yf/)d^N  =  min  f  const2{-0)e~^^*"^^**^dzN 

Let  be  equivalent  S„,  in  Lemma  2.1  and  apply  Lemma  2.1  to  the  above  equation.  Let  ^n^i{Yn-i)  = 
uJ/'-iSj?  $jv(i^Af)i  then  JN-i{Yff-i)  can  be  written  as 

Js-iiY/i-i)  =  (const3)(-0)e“i®*-''-'^^''->\  cxmsts  =  const2(27r)5|^SJ^,^|~i  (2.17) 

where  $Ar-i(^V-i)  is  a  quadratic  function  in  Z^-i  and  Us-2- 
In  general,  the  recursion  produces 

Jt{Yt)  =  const4(-&)e“i^*‘^^‘^  (2.18) 

where  canst^  =  const3nj^(*((27r)5|d5J^,^|~i]  and 

^t{Yt)  =  min  ext  $t+,(fn.,)  (2.19) 

Wc  *e+i 

^t{Yt)  is  a  quadratic  hmction  in  Z*  and  Ut-u  and  «(  found  from  (2.19)  is  the  optimal  control  at  time  t  for 
the  classical  information  pattern. 
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3.THE  OPTIMAL  CONTROL  LAW  WITH  THE  CLASSICAL  INFORMATION  PATTERN 


From  the  derivation  in  Section  2.2  the  following  Theorem  is  obtained,  whidi  is  applied  to  the  central¬ 
ized  LEG  problem  with  current  observation  (the  classical  information  pattern)  and  is  different  from  Theorem 
1  in  Whittle  [21]  who  considered  the  centralized  LEG  problem  with  the  one-step  delayed  information  pattern. 

Theorem  S.l:  Let  S®  be  positive  definite  in  U^~^,Xn,  when  0  >  0  and  5'  be  positive  definite  in 
Ul'~^  and  negative  definite  in  when  5  <  0.  Suppose  is  minimized  with  respect  to  the  decision  as 

yet  undetermined  =  (ut,  •  •  • ,  ua'-i)  and  extremized  with  respect  to  and  the  current  unobservable 
^t+i  ^  given  value  of  Yt.  Then,  the  value  of  U(  is  the  optimal  control  at  time  t  and  the  order  of  the 
optimization  is  irrelevant. 

Proof:  R:om  the  Dynamic  Programming  decomposition  in  Section  2.2,  the  recursion  (2.19)  for 

the  classical  information  pattern  is 

<>i(yt)  =  min  ext  $t+i(y,+i)  (3.1) 

*«+i 

^n{Yn)  =  ^S‘{Xn,Yn)  (3.2) 

Xs 

The  optimal  control  at  time  t  determined  '>om  (3.1)  assumes  the  classical  information  pattern.  4^j(Vt)  can 
be  written  as 


=  minfext  $t+i(y<+i)]  =  niin(ext  min  ext  $t+2(yt+2))  =  niin[$t+2(z;+j,  Zj+j, u*  j,  Zt,  Ut)]  (3.3) 

««  *«+l  »*»  *t+l  ««+l  *«+2 

By  repeatedly  using  equation  (3.1)  in  (3.3) 

^tiXt)  =  min($A,(Z':t*„  Zt,  1/.)]  (3.4) 

FVom  (3.2)  and  (3.4) 

i>t{Yt)  =  min{5'(A;(„  Z^’^\ ,  Z*,  f/.)]  =  inin  ext  «t 5'(A^,  Yn)  (3.5) 

’*«  (/"“*  Z".,  Xh 

Since  we  have  assume  S®  is  positive  definite  in  ,  Xn,  Z^^i  when  ^  >  0,  so  the  order  of  the  minimizations 

can  be  certainly  interchanged.  We  also  assume  S'  is  positive  definite  in  and  negative  definite  in 

Ayv,  Zj^i  when  ^  <  0,  so  S'  possesses  a  saddle  point.  The  operations  of  min  and  max  can  commute  [1].  □ 
The  recursion  of  the  quadratic  function  S'  in  (3.5)  can  be  decomposed  into  a  forward  recursion  Pt  and  a 
backward  recursion  Ft.  In  fact,  the  ability  to  decompose  S'  in  such  a  way  is  the  key  in  solving  the  general 
centralized  discrete  time  LEG  problem  because  it  allows  for  a  separation  of  the  control  algorithm  from  the 
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estimation  algorithm.  In  essence,  this  says  that  in  some  sense  the  separation  principle  (21]  holds  for  this 
class  of  problems.  Let 

t-i 

SdXu  Yt)  =  0-‘(xo  -  xorVo-»(xo  - lo)  + 

k=Q 

+  ^"‘(”*(^*+1.^*.“*)  +  n»fc(a*,ik)])  +  e~^mtizt,Xt)  (3.6) 

S^iXf,  2/"  I.  ^  (xlQtxt  +  ulR^u^ 

N 

+  ^"'nfcCxfc+i.XfciUfc))  +  6”*  fnk{zk,Xk)  +  xJfQNXfii  (3.7) 

k=t+l 

where  Si  is  named  “past  stres£'  and  S2  is  named  “future  stresi'  in  [21]. 

Since  S2  does  not  contain  Xt-i  and  Si  does  not  contain  U[^~^,Xl^i,Z^i,  and  by  Thewem  3.1  the  order 
of  the  optimization  is  irrelevant.  Decompose  (3.5)  as, 

rmn  ext  extS^(A^iv,  Viv)  =  ext[ext  Si(A^,,y't)+  nun  ext  ext  S2(X/^, Z^^i,!//^"*)]  (3.8) 

*.  Jf.-t  t/,"-' -*.+1 2.'i, 

Define  Pt  uid  Ft  as  follows, 

Pt{xuYt)=extSi{XuYt)  (3.9) 

X«-1 

Ftixt)  =  inin  ext  ext  S2(J^/^,  (310) 

n,  -^.+1 

Also  observe  that  the  second  summation  term  of  S2  simply  vanishes  when  extremizing  with  respect  to 

since  extremizing  S2  with  respect  to  z*  for  all  Ic  e  {t  +  1,  t  +  2, ....  N}  yields  zj  =  HkXk  implying  that 

5;^  mfc  =  0. 

It  also  follows  immediately  from  the  above  definitions  of  Pj  and  Ft  that  Pt(xt,  Yt)  satisfies  the  following 
forward  recursion, 

Pt+i{^t+itYt+i)  =  ext[P[(ii, Yt)  +x;J'QtXt  +  li^RtUt  +  9  *(nt  +  mt+i)] 
with  the  initial  condition, 

Poixo,Yo)  =  9~\xq  - g^)'^VQ^{xo  - g^)  +  0~^Tno 
and  Ft(xt)  satisfies  the  backward  recursion, 

Ft[xt)  =  min  ext{Ft+i[xt+i)  +  xjQtXt  +  ulRtUt  +  e~^nt]  (3.11) 

Ui 
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with  the  terminal  oondittoq  Fn{xn)  —  x^QnXh.  FYom  the  above  discussion,  'lYieorem  3.2  is  obtained  which 
is  an  alternative  Ibrm  of  Theorem  3.1.  Theorem  3.2  assumes  the  centralized  LEG  problem  with  the  classical 
information  pattern  and  is  different  from  Theorem  2  in  {21}  which  considers  the  centralized  LEG  problem 
with  the  one-st^  delayed  information  pattern. 

Theorem  S.t.  Let  u*(xi,t)  be  the  minimizing  value  of  ut  in  the  recursion  equation  Ft(xt),  and  let  Xi{Yt) 
be  the  value  of  X|  extrerruzing  P((x(,  Yt)  +  Ft(xt).  Then,  the  optimal  control  at  time  t  is  given  by. 

Proof:  Since  S2  does  not  contain  Xt-\  and  Si  does  not  contain  and  by  Theorem  3.1 

the  order  of  the  optimization  is  irrelevant,  then  from  (3.8)  to  (3.10) 

rmn  ext  «tS‘'(J:w,  Vw)  =  ext[m  Si(.?ft,y;)  +  rnin  ext  ext  =  cxtjPt  +  F,] 

y,  ^*+1  **  ^•+>  ** 

If  the  state  is  completely  observed,  then  the  optimal  control  is  =  u*(xt,t)  and  is  determined  from  the 
badcward  recursion  F((xt)  in  (3.11).  The  optimal  control  in  the  case  of  imperfect  state  observation  is  ob¬ 
tained  simply  by  replacing  xi  by  xJ(jPi),  where  xj (y*)  is  the  value  of  xt  extremizing  P»(if,  Yt)  +  Ft(xt).  This 
replacement  is  a  modified  version  of  certainty  equivalence  {21).  □ 

In  the  following,  first  the  controller  u*(x(,  t)  is  determined  from  the  backward  recursion  Ft(i»)  in  Theorem 
3.3,  then,  the  forward  recursion  P((x(,  Yt)  is  evaluated  in  Theorem  3.4A,  and  finally,  the  optimal  controller 
at  time  t,  t**(xj(yi),t),  is  computed  in  Theorem  3.5. 

In  fact,  the  following  Theorems  3.3,  3.4  and  3.5  are  taken  from  Whittle  [21] .  More  detailed  proofs  than 
Whittle  gave  in  [21]  are  given  in  [9].  The  following  Theorem  essentially  presents  the  results  of  solving  the 
backward  recursion  Ft(zt). 

Theorem  S.S:  Ft{xt)  is  quadratic  in  the  state  variable  Xt,  Fi(xt)  =  xflltit,  and  the  optimal  control  for 
complete  state  information  is  linear,  u*(xt,t)  =  kfXt,  where  the  controller  gain  kt  satisfies 

kt  =  -Rz^BTUt+i{i  +  +  ^Wtne+i)~*A,  (3.12) 

and  Ilf  satisfies  the  discrete  time  Riccati  equation, 

lit  =  Q(  +  A7nt+i(/  +  BtRt  ^Bj^Ht+i  +  ^Wfllt+O^^At,  IItv  —  Qs 

For  these  assertion  to  be  true  when  0  <  0,  it  is  necessary  that  n,+j  <  -(flW,)“‘  for  all  s  >  t. 

Proof.  See  [21]  or  [9] 


11 


Remark:  Notice  lit  >s  positive  aonideiuiite  when  0  >  0.  From  the  backwards  Riocati  equation  if  /  -f 
>  0,  then  11  >  0  when  0  <0. 

The  recursion  equation  for  Pt(xt,  Yt)  is  rewritten  in  the  following  form, 

Pt+i(it+i,  Yt+i)  =  ext(P,(ii.y|)  +  xjQtXt  +  ufRiUi 

+  *("»(*<+ !.*».**<)  +  "»»(**•**))]  +  ®“'"»»-i(*t+i.a^t+x)  (3.13) 

where 

t-i 

Pt{xt,  Yt)  =  «t  (0“‘(xo  -  So)^''o“‘(*o  -  aso)  +  Y^i^lQkXk  +  uJP*Ufc  +  0~^(nk  +  mfc))]  (3.14) 

The  reason  for  rewriting  the  Pt  recursion  as, 

PiixuYt)  =  Pt(xt,Yt)  +  0-^mtizt,Xt)  (3.15) 

is  that  the  recursion  equation  for  Pt  is  exactly  the  same  as  the  equation  for  Pt  in  Whittle’s  paper.  Therefore, 
we  can  invoke  Theorem  4  in  [21]  to  obtain  an  expression  for  Pt(xt,  Yt)  =  Pt{xt,  Yt)  —0~^mt(^t,xt)  whidi  will 
be  needed  in  Theorem  3.5  to  obtain  the  optimal  controller  with  the  dasacal  information.  A  less  complete 
form  of  this  theorem  without  the  explicit  form  of  Lt+i(Yt+i)  (3.18)  was  originally  given  in  [21].  However, 
as  we  shall  see  in  Section  6  in  order  to  obtain  the  optimal  decentralized  controller  with  the  one-step  delayed 
information-sharing  pattern  we  need  an  explicit  expression  for  Lt+i(Yt+i)  (3.18). 

Theorem  3.4:  Pt{xt,Yt)  has  the  form 

Ptixt,  Yt)  =  0-\xt  -  x,)^Vr‘(xt  -  xt)  +  LtiYt)  (3.16) 

where  Lt{Yt)  contains  all  those  terms  independent  of  xt  and  the  matrix  Vt  and  the  vector  it  satisfy  the 
recursions, 

=  Wt  +  A.(Vr‘  +  HTQT^Ht  ■¥  0Qt)-^Aj  (3.17) 

Lt-n(Yi-n)  =  Lt(Yt)  +  xli0Vt)-^xt  +  zT{0et)-^zt  +  uTRtUt 

-  0-\vr^xt  +  HTe:^zt)‘^iv-^ + + oQt)-\v-^xt  //rer‘*t)  (3.18) 

Xt+I  =  AtXt  +  BtUt  +  At^y^~^  -I-  +  9Qt)~'(Hj'Qi^{zt  -  HtXt)  -  0QtXt)  (3.19) 

with  the  initial  condition  oov(zo)  =  V^,  Lq  =  0,  £o  =  2o-  Moreover,  tor  these  assertions  to  be  true  in  the 
case  ^  <  0,  it  is  necessary  that  V,"*  -f  OQ,  >  0  for  all  s  <  t  -f  1. 


proof.  Proof  of  this  theorem,  especially  (3.18)  is  pven  in  Appendix  A  or  in  [9], 

The  foUowing  lonma  will  show  the  relationship  of  two  inequalities  that  w^-e  used  in  the  proof  of  TheorNn 
3.4  and  the  positive^definite  property  of  Vi. 

Lemma  S.  1  The  following  inequality  exists 

V,-  ‘  +  8Q,  +  Hj er  *  W,  >  0  (3.20) 

if  and  only  if 

V;-*  +  OQt  +  >  0  (3.21) 

for  8  /  0  where  t  €  {0,  •  •  • ,  Af  -  1}.  The  existence  of  (3.%)  implies  that  >  0. 

Proof:  See  Appendix  B. 

From  Theorem  3.4,  £<  is  the  value  which  optimizes  Pt{xt,  Y^).  The  next  theorem  will  find  it  whidi 
optimizes  Pt(xt,  Yt). 

Theorem'  S.4A:  Pt{xt,Yt)  has  the  form 

Pt(xi,  Yt)  =»  (r\xt  -  x,)’'vr‘(*t  -  it)  +  Lt{Yt) 

where  t(Yt)  contains  all  those  terms  independent  of  X(,  and  the  matrix  and  the  vector  it  satisfy  the 
recursion, 

vi+,  =  Wt+ Ati^r' + oQt)-^AT,  ^t+t  =  {v-\  +  (3.22) 

it(Yt)  =  Lt(Yt)  +  xt]fiVt  +  [HT{9et)-^Ht)-^]-^it 
+zt[eet  +  Htm)aT]~^^  -  2x.((8Vt)->(<?K,)i/.(<?e.)-‘i*i 

it  =  v;(v;-‘xt  +  UTQT\)  =  i,  +  VtHjQj\zt  -  HtXt)  (3.23) 

it+i  =  At(J  +  0VtQt)~^it  +  BtUt 

with  the  initial  condition  Vo  =  (V{,“‘  +  and  xo  =  (V;,“‘  +  H^Qo^Ho)~^iVQ^xo  +  jH^Gq '*o). 

where  V^,X(  are  given  in  Theorem  3.4.  Moreover,  for  these  assertions  to  be  true  in  the  case  0  <  0,  it  is 
necessary  that  +  0Q,  >  0  for  all  s  <  t  + 1. 

Proof.  The  proof  is  similar  to  that  of  Theorem  3.4  (See  {5}). 

Notice  that  the  control  gain  kt  in  (3.12)  contains  OWt  and  the  state  estimate  Z(  (3.19)  contains  0Qt.  The 


13 


control  gain  depends  on  the  covariance  of  the  noise  and  the  state  estimate  depends  on  the  state-penalties. 
Nevertheless,  the  computation  of  the  control  law  is  decoupled  from  the  computation  of  the  state  estimates 
implying  that  the  separation  principle  holds  even  though  not  in  the  same  strict  sense  as  for  the  LQG  problem. 

We  can  now  appeal  to  Theorem  3.2,  3.3,  3.4,  3.4A  and  the  discussion  leading  to  equations  (3.13)'(3.15) 
to  obtain  the  final  result. 

Theorem  S.5:  The  optimal  control  rule  is  =  ktxl,  where 

xl  =  {I  +  dV^Tlt  +  V,Hle:^Htr\xt  + 

=  + 

and  kt,  11*,  Vj  and  Xt  are  given  in  Theorems  3.3  and  3.4  and  it,  Vt  are  given  in  Theorem  3.4A 
condition  for  (3.24)  is  V^~^  +  6llt  -f-  Hj^Qt^Ht  >  0  when  0  <Q. 

Proof::  As  shown  in  Theorem  3.2  the  optimal  control  is 

u*(i„ou=,.  =  v(x:{v;),t) 

where  xj  is  determined  by  extremizing  Pt  -I-  Ft  with  respect  to  Xj, 

ext[i^(  -f-  —  extji^  +  0  (3.26) 

Xt 

Using  Theorem  3.3  and  3.4,  the  right  hand  side  of  the  above  equation  can  be  written  as, 

ext[xfn(Xt  -f  6~Hxt  -  Xt)'‘'V~^(xt  -  Xt)  -1-  Lt{Yt)  +  6~^(zt  -  HtXt)'^Qt^izt  ~  HtXt)]  (3.27) 

Performing  the  indicated  operation  yields  (3.24).  Using  Theorem  3.3  and  3.4 A,  the  left  hand  side  of  (3.26) 
can  be  written  as 

extfxfritxt  -1-  0~\xt  -  Xt)'^Vt~^{xt  -  Xt)  -I-  Lt{Yt)] 

Xt 

where  the  optimal  value  of  xi  is  given  by  (3.25).  Therefore,  x*  is  obtained  in  (3.24)  and  (3.25),  provided 
{0Vt)~^  +  lit  -1-  Hf  {0Qt)~^Ht  <  0  when  0  <  0.  Using  (3.22)(3.23),  it  is  easy  to  show  that  the  two  x*  are  the 
same.  Finally,  from  Theorem  3.2  u*(xl,t)  =  ktXt  and  can  thus  deduce  the  Theorem.  O 

The  form  for  the  classical  information  pattern  in  (3.25)  is  similar  to  that  of  the  one-step  delayed  infor¬ 
mation  pattern  in  [21],  only  Vt,Xt  are  changed  to  Vt,xt.  The  form  in  (3.24)  is  an  affine  function  which  will 
be  needed  in  Theorem  5.1. 

Discussion  of  Sufficient  and  Necessary  Conditions:  The  sufficient  conditions  given  in  Theorem  3.1  are 
different  from  the  necessary  conditions  given  in  Theorems  3.3-3.5.  Below  we  show  that  the  necessity  of 
Theorems  3.3-3.5  are  also  sufficient. 


(3.24) 

(3.25) 

.  The  necessary 
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Theorem  3.6:  Then  eadsts  »n  uiuque  finite  optimal  control  function  which  is  given  in  Thaotem  3.5  and 
yields  finite  cost,  if  and  only  if  the  following  three  inequalities  exist  for  all  t  €  {0, ....  TV  -  1} 

1.  +  OQt  +  >  0  for  all  a  <  t  when  d  <  0. 

2.  v;-*  +  dn,  +  HTOT^Ht  >  o  when  d  <  0. 

3.  0Wgn,+i  +  />0forallTV>s>t  when  d  <  0. 

Proof.  In  order  to  obtain  the  optimal  control  function,  Ft,  Pt  in  theorems  3.3  and  3.4,  and  xl  (3.24)  must 
exist.  Condition  1  guarantees  the  existence  of  Pt.  Condition  2  is  for  the  existence  of  z,  in  (3.24).  Ccmdition 
3  is  from  Theorem  (3.3)  and  guarantees  the  existence  of  Ft  in  (3.11). 

The  necessary  conditions  are  proved  by  contradiction.  Notice  could  be  decoupled  to  four  parts.  1110 
first  part  is  from  5i(3.6)  and  (A.3)(see  [9]),  the  second  part  is  from  Theorem  3.5,  the  third  and  fourth  parts 
are  from  (3.7)(3.10)  and  (3.11). 

t-i 

S'  =  y;(z*  -  z;)^d-‘(Vfc-’  +  dQ*  +  Ak)(xk  -  xl) 

kseO 

+  (xt  -  z;)^(n.  +  (dVt)-‘  +  Hl{eet)-^Ht){xt  -  x,*) 

A^-l 

+  ^  (Xfc+1  -  X*+3)’^ln/k+l  +  (dW'/t)~‘](Xfc+i  -  ij+i) 

A'-l 

+  (“*  -  **:f(«*  +  ^2'nfc+,(dJv*nfc+,  +  /)-*B*)(tt*  ~  ui) 

k=t 

where 

**(*  <  <)  =  (v;-* +(fQk+ hIqz^Hh + 

■(V*-'zfc  +  -  BkU*)) 

xt*  =  (/  +  dKnt)-‘x, 

x;^,(fc  >  0  =  («n*+,  +  M'-»)-^H7»(>4fcX* + s*u*) 

K{k  >  t)  =  -(ft*  +  B2’n*+i(dvVfcnk+i  +  /)-‘fi*)-* 

•B2'nfc+,(diVfcnfc+,  + /)-Mfcx* 

FVom  Lemma  3.1  +  0Qk  +  >  0  if  and  only  if  +  d(?*  +  /fjG^ ‘//a  >  0, 

therefore  the  form  in  condition  1  is  obtained.  Condition  2  is  from  the  second  part.  Because  condition  3 
implies  Ila+i  >  0  and  F*  +  Bjnfc+i(dlVfcnfc+i  +  /)“'Bfc  >  0,  only  condition  3  is  needed  for  parts  3  and  4. 
If  one  of  the  three  inequalities  does  not  exist,  then 

J  =  min  f  (—0)e~i^^dXfiidZN  =  ±oo 
J-eo 


Therefore,  the  expected  value  of  the  cost  function  would  be  infinite  or  negative  infinite.  The  necessity  of 
these  conditions  is  established.  □ 
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Remark:  Notice  the  three  inequalities  always  exist  when  0  >  0.  Also,  the  three  inequalities  exist  if  and 
only  if  the  assumption  of  S‘  in  Theorem  3.1  exists.  F\irthermore,  the  following  relation  associated  with  the 
gain  (3.12)  can  be  proved  that 

{Rt  +  +  I)-Ut 

Theorem  3.5  provides  the  discrete- time  centralized  controller  with  the  classical  information  for  finite 
horizon  and  time>\'arying  coefficients.  For  the  time-invariant,  infinite  horizon  case  under  certain  conditions, 
a  time-invariant  controller  results.  It  is  shown  in  [6]  that  this  time-invariant  controller  is  equivalent  to  a 
Hoo  controller,  and  the  best  Hoo  controller  is  produced  wh«^  6  is  decreased  to  a  critical  value  where 
the  cost  goes  to  infinity.  However,  the  information  pattern  is  not  discussed.  The  controllers  with  the 
classical  information  or  one-step  delayed  information  patterns  may  be  time-varying  or  time-invariant  and 
finite  horizon  or  infinite  horizon.  Therefore,  the  controllers  here  generalize  Hoo  controller  to  a  larger  class  of 
controllers. 

4.  DYNAMIC  PROGRAMMING  FOR  THE  DECENTRALIZED  CONTROL  PROBLEM 

This  section  is  begim  by  presenting  a  variation  in  the  Dynamic  Programming  recursion  given  in 
Section  2.2.  The  difference  is  that  instead  of  conditioning  on  the  classical  information  pattern  Yt  as  in 
(2.18),  the  recursion  is  conditioned  on  the  one-step  delayed  information  pattern  Vj.  This  change  allows  the 
explicit  formulation  of  the  static  team  problem  at  each  stage  of  the  recursion.  The  analysis  is  the  following. 
First,  the  Dynamic  Programming  recursion  is  obtained  conditioned  on  Yt.  FYom  the  Dynamic  Programming 
a  recursion  associated  with  the  argument  of  the  exponential  Dt(I'()  is  defined  in  (4.1).  Note  that  Lt(Vt) 
and  its  recursion  repiace  $t(i^()  (218)  and  its  recurrion  (2.19).  Et(y'()  is  propagated  backwards  assuming 
that  the  decentralized  controller  is  affine,  which  is  shown  to  imply  that  it  is  quadratic  at  each  stage  time. 
In  particular,  in  Section  4.1  it  is  shown  that  if  is  quadratic,  then  the  global  optimal  decentralized 
controller  u*  is  affine.  Since  ^^^(yOv)  is  quadratic,  then  by  induction  Ef(Yt)  will  remain  quadratic. 

A  deeper  property  of  'Bt{Yi),  shown  in  Section  5.1,  is  that  the  quadratic  form  of  Et(rt)  at  each  time  suge 
is  independent  of  the  information  pattern.  This  is  because  the  saddle  point  strategies  for  all  the  information 
patterns  produce  the  same  saddle  point  trajectory  which  is  used  to  construct  St(i't)-  From  this  property 
a  uniform  approach  is  produced  for  the  development  of  controllers  with  different  information  patterns;  a 
considerable  simplification  over  that  given  in  [10]. 

The  optimal  return  function  is  defined,  rather  than  as  in  Section  2.2,  as 

=  min  E[-ee-i^*\Yt+t]f{Zt\Ut) 
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Then,  the  Dynamic  Programming  recursion  rule  changes  from  Jt(yi)  in  (2.15)  to 

=  min  f  Jt^i(Yt+i)dzt  =  f  Jt+i{Yt)dzt 

“*  j  — OO  j — oo 

In  Lemma  5.1  the  assumption  that  the  controller  is  affine  is  used,  and  forms  the  basis  for  the  propagation 
of  the  quadratic  function  E(.  The  relationship  between  Jt(Vt)  and  Et{Yt)  is  given  as 


MYt)tx  exp[-i0Et(yOl.  E.(yt)  =  extminEt+,(yi+i) 

/  *t  Ut 


(4.1) 


where  Et+i(yi+i)  is  a  quadratic  function  of  Zt  and  Ut,  and  Ea?  =*1?  ^n{Yn)-  From  (2.19)  and  (4.1),  the 
following  relation  is  obtained. 


Et(yi)  =  ext4*t(yi) 

FYom  (6.1)  and  (6.2),  Et+i(yi+i)  can  be  decoupled  as  two  parts 

St+i(yt+i)  =  /t(tit,  Zt,it)  +  Lt{Yt) 


(4.2) 


(4.3) 


where  Lt(Vt)  is  in  (3.18)  and  ft{tit,zt,xt)  will  be  given  in  (6.2).  In  the  next  section  it  is  shown  that  the 
static  team  controller  which  globally  minimizes  /<  is  affine.  Therefore,  starting  with  Eat,  Et  by  the  recursion 
(4.1)  is  propagated  backwards  and  by  induction  Et  remains  quadratic. 

4.1  The  static  team  problem 

Tb  obtain  the  static  optimal  team  controller  we  need  a  definition  of  person-by-person  optimality. 
Then,  it  is  shown  that,  with  an  additional  assumption  on  the  convexity  of  the  quadratic  function  ft,  person- 
by-person  optimality  implies  global  team  optimality.  Although  the  cost  C  of  Definition  1.1  is  not  explicitly 
defined,  it  is  related  to  the  optimal  return  function  Jt+\  as 

min£;(Ct(ut(i,*t,x,),Zf,i,,yt)li't,zJ)  =  ">m  f  Jt+i(yt,Zt,ut(t>«t.xt))dz,(i)  (4.4) 

uj  «;  J-ao 


oc.  min 


•  r 

J-c 


exp[--eft{utii,^t,xt),zt,xt)]dzt{i) 


«  cip(--&min  ext  fA 

2  «;  fi(«) 


The  last  relation  is  from  Lemma  2.1  and  ^f(0  —  ,  where  zt(i)  is  the  vector  zt  without  the  observation 

of  the  i-th  team  member,  and 


«t(*.*t.X|) 


{ 


uj  if  i=j 


Note  that  in  [15]  and  [13]  person-by-person  optimality  is  defined  with  respect  to  the  left  side  of  (4.4).  Below 
we  define  person-by-person  optimality  with  respect  to  the  right  side  of  (4.4). 

In  the  discussion  of  the  static  team  problem,  the  subscript  t  is  dropped  for  convenience. 
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Definition  4- 1'- 

A  static  team  decision  rule  u*  €  t/r  is  person-by-person  optimal  if  for 

J  —  /(u,  z,x),  I  J|  <  oo 

and  if  u**  is  determined  from 

min  (ext  f(ii(i,  z  w  t  €  {1, 2,  •  •  • ,  M) 

«‘6t/4.  i(») 

□ 

Let  /  have  the  particular  quadratic  form 

f{u,Z,x)  =x'^QiiX  +  2x'^Qi2Z  +  2x'^Ql3U  +  z'^Q22^  +  ^Z^Q23^  +  ti^Q33ti  (4.5) 

where  x  is  a  constant  vector  and  Qu,  Q12,  Q13,  Q22, Q23  snd  Q33  are  defined  in  (6.3)  and  are  function  of  0,  a 

real  number  and  6  The  operator  “ext”  means  “min”  when  6  >  0  and  “max”  when  0  <  0.  The  vectors  u 

and  z  are  partitioned  as  u  =  (u*^,  •  •  • ,  and  z  =  •  •  • ,  and  the  matrices  Qn,  Q\i,  Q22,  Q23 

and  Q33  are  partitioned  into  block  form,  to  correspond  to  the  partitioning  of  the  vectors  u  and  z,  as 

<?12  =  (<?12i"*>Qi2]5  Qi3  =  (Qi3>’”iQi3) 

O"  ...  QiM 

Qpp  —  .  t  PP^  {22, 23, 33} 

Qmi  ... 

pp  '^pp 

Let  us  also  denote, 

R  —  Q33D  +  Q23D  -I-  (Q23Z?)^  H-  Q22  (4.6) 

6  =  -I-  (Qi3Df  +  Q23C  +  Qj,  (4.7) 

where  D  and  C  are  determined  from  (4.8)  and  (4.9). 

It  is  shown  in  Lemma  4.1  that  the  person-by-person  optimal  team  rule  for  the  quadratic  function  /  is 
affine.  In  Lemma  4.2,  by  completing  the  square  and  using  a  stronger  assumption  than  required  for  person- 
by-person  optimality,  the  affine  person-by-person  rule  is  also  globally  optimal. 

Lemma  4-1'-  Let  the  quadratic  function  /  be  given  as  (4.5).  If  -  (^)^a’)9*  >  0,R  >  0  when  0  >  0, 
and  if  >  0,  ^  <  0  when  0  <  0,  then  the  optimal  person-by-person  controller  for  the  one-step  delayed 
information  pattern  is  given  by 

«*  =  Cx  +  Dz 
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where  JD  is  a  block  diagonal  matrix  with  dimension  for  the  diagonal  matrix  D',  and  C  is  partitioned 

according  to  u  into  p.  x  n  blodcs,  t  €  {1,  •  •  • ,  M]  where 

D*  =  -IQS  -  (4.8) 

CP  =  -(QS  -  +  f;  Q%C^  -  (4.9) 

where  for  i,j,  A:  €  {1,  •  •  • ,  M}, 

a*=ith  minor  of  D'^Qs^D  +  Q23D  +  (QTiD)"^  +  Qn:  the  ith  minor  does  not  depend  on  /?*, 

i‘  =  t(<3?j + 0S3D')’' + <3Sc‘ + E:K..,wi 

Furthermore,  the  optimal  cost  is 

ert/(u*,z.x)  =  f{u\z\x)  =  i^(Q„  +QnC-¥{Q^zCf  ^■C'^Q^^C  -  P'R-'6)x  (4.10) 

Proof.  See  Appendix  C, 

Remark;  Note  that  Q%  =  =  /f(i)u*  (<*‘)“‘  =  /*(<)*(«)•  additional  characterization  of 

and  S',  see  equations  from  (C.3)  to  (C.6). 

Remark:  The  optimal  person-by-person  control  function  is  a  stationary  point  for  the  function  /. 

Lemma  4-S:  If  Q33  >  0,  the  person-by-person  optimal  control  function  determined  from  Lemma  4.1  for 
the  quadratic  function  f{u,z,x)  (4.5)  with  the  one-step  delayed  information-sharing  pattern  (OSDISP)  is 
also  the  unique  global  optimal  team  controller,  i.e.  for  Ur  defined  in  definition  1.1  and  Vu  €  UT,a^  u*,  the 
inequality  /(u,  2,1)  >  f{u*,z,x)  holds. 

Proof.  FVom  page  187  of  [1],  for  a  s'  rictly  convex  function  the  person-by-person  optimal  solution  is  unique 
and  is  also  team-optimal.  In  particular,  since  /  is  a  quadratic  function  where  it  is  assumed  that  Q33  >  0, 

/(u*  +  6u,  z, x)  -  /(u*.  2, x)  =  Su^Q^sSu  +  2{Qz3[Cx  +  Dz)  -|-  Q^2  -|-  Q^z)’’fiu 

By  applying  u*  =  CPx  -i-  D'z'  to  Qsi{Cx  -1-  Dz)  +  Q^z  -1-  Q^x,  (C.l)  is  obtained.  Since  2  satisfies  the 
stationary  conditions  and  5u*  depends  on  2*  only,  2(i)  in  (C.l)  is  eliminated  by  using  (C.2) 

(Q33(Ci  +  Dz)  +  Q'^z  -1-  QX^xfSuiz) 

=  i;{Q>‘  +  l(Qi3)"’+  E  Qj&cr>lx  +  (Q&r^‘  +  ^"’*(0}"’«u‘(2) 

i=l 

M 

=  -CPx-  D'zYiQ%  -  ^V^)5u‘(2‘)  =  0 
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Since  Qas  >  0, 


f(u*  +  Su,z,i)  —  i)  —  Su^QaiSu  >  0 


Therefore,  for  arbitrary  6u 

/(u*  +  5u,  a,  x)  >  /(u*,  z,  x) 

□ 

The  following  Theorem  shows  that  the  optimal  control  u*  €  Ut  from  Lemma  4.1  also  minimizes 

Theorem  4.1:  Let  the  function  /  be  a  quadratic  function  given  in  (4.5)  and  Q33  >  0.  Then,  the  control 
function  u*  €  Ut  given  in  Lenuna  4.1  also  minimizes  and  —6e~^^^dz. 

Proof.  Since  Q33  >  0,  /  is  a  strictly  convex  hmction  with  respect  to  u  and  there  exists  an  unique  control 
function  u*  which  minimizes  /.  If  u  u*,  f{u*,z,x)  <  f(u,z,x).  For  $^0,  then 

_  (4.11) 


for  Vu  €  Ut,  z  and  x.  From  Theorem  C  in  page  96  of  (7]  and  equation  (4.11),  we  know 

n  •*•*)<£«  <  f°° 

•/— 00  •/— 00 

If  the  equity  exists,  then 

f  /<“•*•*)  -  =  0 

7-00 

By  using  Theorem  B  on  page  104  of  [7] 


The  above  equation  contradicts  equation  (4.11).  Therefore,  only  the  inequality  exists 

7—00  7—00 

for  all  u  €  Ut,  z,  and  x.  O 

Remark:  Theorem  4.1  gives  the  sufficient  condition  for  the  optimality  of  the  team  problem  where  0^0. 
This  extends  the  result  in  [13]  which  deals  with  the  convex  exponential  function  (6  <  0)  to  a  nonconvex  but 
unimodal  function  (0  >  0)  as  given  in  Theorem  4.1. 

5.  EFFECTS  OF  INFORMATION  PATTERN  ON  DYNAMIC  PROGRAMMING 

RECURSION  AND  COST  FUNCTION  CRITERION 
The  main  purposes  of  this  section  is  to  ^ow  that  the  recursion  for  Et  (4.1)  is  independent  of  the 
information  pattern  and  that  only  the  ooeffident  of  the  exponential  |A|  in  (4.6)  delineates  the  differences  in 
the  value  of  the  cost. 
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FVom  Theorem  3.5  and  Theorem  4.1,  the  optimal  control  for  the  classical  information  and  one-step  delayed 
information-sharing  patterns  is  an  afhne  function.  The  optimal  control  for  the  one-step  delayed  information 
Pattern  [21]  can  also  be  expressed  as  an  affine  function  u*  =  Cx  +  Dz,  D  =  [Oj.  FVom  the  above  discussion, 
the  optimal  control  for  the  three  different  information  patterns  can  be  written  as  u*  =  Cx  +  Dz,  where  C 
and  D  are  different  for  different  information  patterns. 

As  will  be  seen,  the  variable  z  in  Lemma  5.1  plays  a  role  equivalent  to  v  in  Lemma  2.1.  However,  because 
u*  =  Cx  +  Dz  is  explicitly  dependent  on  z,  some  modifications  need  to  be  done  on  Lemma  2.1  in  which  u,  t; 
are  independent.  Lemma  2.1  is  a  special  case,  i.e.  D  =  [0],  of  Lemma  5.1.  Lemma  5.1  is  needed  for  the 
Dynamic  programming  decomposition  in  Section  4  and  gives  the  coefficient  of  the  exponential  |H|. 

Lemma  5.1:  Let  the  function  /  defined  in  (4.5)  be  a  strictly  convex  function  with  respect  to  u,  i.e. 
Q33  >  0.  Let  ti'  be  the  value  which  minimizes  function  /  and  and  u*  is  an  affine 

function  of  z,  i.e.  u*  =  Cx  -t-  Dz.  If  d  >  0,  assume  >  0;  if  0  <  0,  assume  fl  <  0;  and  dim(2)  =  r  where  R 
is  given  in  (4.6).  Then, 

min  ■**.*)  (5.I) 

“  J-co 

where  z*  minimizes  /  when  ^  >  0  and  z*  maximizes  /  when  0  <0. 

Proof.  Let  the  function  /  be  defined  as  in  (4.5)  and  u’  is  the  optimal  u  which  minimizes  /  and 
The  discussion  at  the  beginning  of  section  5  indicates  that  the  optimal  control  func¬ 
tions  ’or  the  three  different  information  patterns  can  be  expressed  as  an  affine  function  u*  =  Cx  +  Dz. 
Notice  that  C  and  D  are  different  for  different  information.  Let  R  and  6  be  defined  in  (4.6)  and  (4.7)  and 
substitute  v*  =  Cx  -t-  Dz  into  /  of  (4.5),  then 

/(u*,  2,  x)  =  2^  t<.z  2x'^^ z  -|-  {C^Q^C  -j-  Q13C  -f-  {QisC)^  +  Qii )x 

=  (2  -t-  R~^6xfR{z  +  R-’fix)  +  x^(Q„  -  fR-^S)! 

where  =  C^QaaC  -t-  Q13C  -h  (QiaC)^  -i-Qii-  Also  define  z  =  z  +  R~  ^6x,  then 

r  =  r  (5.2) 

J —00  J—OO 

_  «  s  T  ^  rTT  A— 1?\  w  1#  mi  next 

= -«(2n)5|0H|-4e-5*  «  *>*=  -0(2n)5|»Hl-ie~5  “  *7  (5.3) 

In  order  to  obtain  (5.3)  from  (5.2)  we  need  to  assume  R  >  0  when  0  >  0  and  R  <  0  when  ^  <  0.  If  the 
assumption  is  not  satisfied,  the  integration  of  (5.2)  will  be  infinite  (0  <  0)  or  negative  infinite  (0  >  0).  FVom 
Theorem  4.1  and  (5.3), 


/oo  roo 

-0e-!/(“-*)d2=  /  - 

•OO  J  —  00 


»  m  %  -mincxi, 

^e-|/(«  ^*^)dz  oc  c"^  “  * 
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5.1  The  recursion  of  I)t(K)  and  its  independence  of  the  information  pattern 

In  this  section  we  give  a  theorem  which  shows  that  the  argument  E(  is  the  same  for  the  classical  informa- 
ti"'n,  one-step  delayed  informav>;n  and  one-step  delayed  information-sharing  patterns  (OSDISP).  This  will 
simplify  the  derivation  about  the  control  gains  for  OSDISP,  i.e.  the  backward  iterative  algorithm  given  in 
[10]  is  no  longer  needed. 

Theorem  5.1:  Let  S®  be  positive  definite  in  Ui~^,Xn,Z^  when  9  >  0  and  S®  be  positive  definite  in 
and  negative  definite  in  Xn,  when  5  <  0.  Then,  the  value  of  Et(yt)  in  (4.1)  is  the  same  for  the 
three  different  information  patterns:  the  classical  information,  one-step  delayed  information,  and  one-step 
delayed  information-sharing  patterns. 

Proof.  In  Section  4,  the  recursion  from  the  Dynamic  Programming  decomposition  produces  (4.1)  and 
(4.2)  where  is  given  in  (3.5).  The  order  of  the  extremization  in  (3.5)  is  irrelevant  (Theorem  3.1).  By 

using  Theorem  3.1,  and  the  operations  and  definitions  used  in  Section  3,  an  expression  for  is  obtained  for 
the  classical  information  pattern  as 

i;,(V'j)  =  ext$t(yt)  =  min  extextS®(A^iv,  VXr) 

xt  zf  Xu 

=  extext[ext  Si(A’„yt)-l-  imn  ext  ext  S2(A’,^,  f/,^“’)] 

*.  X.  X,_,  -  X,",  Z,", 

=  extext[Pt  -f  Ft]  =  extext(Pt  +  -l-  Ft] 

=  extext(0~‘(it  -  XlVV^~^{xl  -xt)-|-  Li{Yi)  -  HtXt)  -f-iflltXt]  (5.4) 

Zt 

where  5i,  S2,  Ft,  Pt  are  defined  in  (3.6)-(3.10).  If  {9Vt)~^  d-  lit  ■+  Hj {0Qt)~^ Fit  <  0  when  0  <  0  and  clearly 
{6Vt)~^  +  fit  Hj^{6Qt)~^Ht  is  always  positive  definite  when  0  >  0,  then  the  stationary  condition  of  it 
which  optimizes  (5.4)  is  (3.24).  Substitute  xj  into  (5.4) 

e^tizJieet  +  HtHeVt)-^  Dt)-'  -  2xr(i?v,)-i[(^vi)->  +  nt  +  HT{9Qtr'Ht]-'HT{eetr^zt 

+  xJ[9Vt  (Dt  HTieet)-^Htr^]-^Xt  -H  Lt{Yt))  (5.5) 

Assume  6Qt  +  Ht{,{6Vt)~^  -t-  Ilt)"*///^  <  0  when  0  <0  and  clearly  6Qt  +  Ht{{0Vt)~^  +  is  always 

positive  definite  when  9  >  0.  Then,  from  the  first-order  stationary  condition  of  *t  for  optimizing  (5.5),  we 
obtain 


zl  =  HtiI  +  9Vtntr^xt 

If  Zt  is  substituted  into  (5.5),  then  we  obtain  (See  Appendix  D  for  details  of  derivation) 

Et  =  xrntl^Vint  -h  /j-'it  +  Lt{Yt) 


(5.6) 


(5.7) 
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fttr  the  one-step  delayed  information  pattern,  Zf*  is  not  available  at  tinte  t.  lYom  the  assumption  about 
5*,  the  order  of  the  optimizaton  is  irrelevant  w.r.t.  Z{*,Xn- 

min  extext S^iXs,  Vn)  =  extlP*  +  ext®"‘mt  +  F|] 
zf  Xu  *«  »» 


=  ext[ifn,x,  +  ‘(x, -x*)  +  z.i(yi)j  (5.8) 

X( 

Extremizing  with  respect  to  2(  will  make  mi  =  0.  If  nt  +  (ffVt)”*  <  0  when  9<0  where  clearly,  nj  +  (8Vi)“‘ 
is  always  positive  definite  when  0  >  0,  the  optima]  x^  will  be  as  foDows. 

x:=(/  +  flV,nt)-‘x.  (5.9) 

By  substituting  x^  in  (5.9)  into  (5.8),  (5.7)  is  obtained.  Therefore,  the  classical  information  pattern  and  the 
one-step  delayed  information  pattern  have  the  same  £«.  In  order  to  obtain  x^  in  (3.24)  and  (5.9),  and 
in(5.6),  we  assume  (flVi)-‘  +  H,  -I-  HT{eet)-^Ht  <  0,  n,  +  (eV,)-*  <  0,  tf©,  -f  f/,((8V,)-‘  +  B, )'»//?’  <  0 
when  8  <  0.  The  assumption  that  S‘  is  positive  definite  in  and  negative  definite  in  Xs,Z^  when 
d  <  0  vnll  guarantee  the  three  inequalities  exist.  This  assumption  on  guarantees  that  Jt{Yt)  in  (4.2)  is 
finite.  If  any  one  of  the  three  inequalities  is  not  satined,  then  Jt{Yt)  will  be  infinite. 

From  (2.9),  Us  CUt  C  Uc-  If  5  is  a  strictly  convex  fimction  with  respect  to  u,  then 


min  g  >  min  g  >  min  g 
w€£/5  u€Vt  i»6i/c 

If  g  =.cuc  9t  then  •.* V5  g  s;.e(/r  g  — 9-  We  write  in  a  more  explidt  form  at  time  stage  N  —  I 


£jv-i  =  ext  min  =  min  ext  extS^ 

*u-i  uw-i  uj»_j  2jJ_j  Xu 


Because  Us  C  Ut  C  Uc 

min  ext  ext  S'  >  min  ext  ext  S'  >  min  ext  ext  S' 
uu-i€Os  Xu  *u-i^Ut  Z^_-^Xu  *u-tCl^c  Zfl^^  Xu 

£(  has  shown  to  be  the  same  for  the  classical  and  one-step  delayed  information  patterns,  therefore 


min  ext  extS'=  min  ext  extS® 
uu-\€Us  Xu  vu-t€l/c  Z}l{_j  Xu 

F^m  the  above  two  equations 


min  ext  ext  S'  =  min  ext  ext  S'  =  min  ext  ext  S' 
uu-i€Us  Z{j_^  Xu  *u-t^VT  Z{S_^  Xu 

At  t  =  TV— 1  the  classical  information,  one-step  delayed  information  and  one-step  delayed  information-sharing 
patterns  have  the  same  £jv-i 

^N-i(Yn-i)  =  X?/_inyv-l(tfV)v-in/v_l  -f-  /]”*XiV-l  +  LiiI-i(Yn-i) 
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By  induction,  using  the  recursion  for  £(,  the  value  of  £t  for  the  decentralized  case  with  the  one-step 
delayed  information-sharing  pattern  is  giving  by  (5-7)-  □ 

Remark:  FVom  Theorem  5.1  and  (4.3)  ,  the  value  of  are  the  same  for  the  three  information 

patterns.  Therefore,  from  (4.10)  and  (5.7),  iffltl^KHi  +  =  xJ'lQn^  +  Qis^tCt  +  (Q\3,tCt)^ 

cTQ33.tCt-iTRr'it)x. 

Remark:  R  defined  in  (4.6)  is  different  for  different  information  patterns  because  D  is  different  for  different 
information  patterns.  In  Theorem  5.1  we  showed  that  the  value  of  £<  is  the  same  for  the  three  information 
patterns.  Therefore,  only  |R|  in  (5.1)  makes  the  cost  function  different  for  different  information  patterns. 
5.2  Ck>mparison  of  two  solutions  using  the  classical  information  pattern 

In  Speyer,  Deyst,  and  Jacobson  [18]  the  optimal  controller  for  the  classical  information  pattern  is  derived 
as  a  function  of  the  entire  smoothed  history  of  the  state  vector  from  initial  to  the  current  time.  This  controller 
appears  to  be  different  from  that  given  in  Theorem  3.5.  Two  reasons  for  the  difference  are  :  1)  as  shown 
below  the  probability  density  functions  are  the  same,  they  are  functionally  expressed  differently,  and  2)  the 
order  of  the  extremization  (or  integration)  is  different.  The  probability  density  function  given  in  Lemma  2.2 
is 

=  nj^_i[/(ifcllA:)/(XfcIx*_i,U/t-l)]/(*oko)/(lo) 

N-l  N 

oc  exp{  Vn/t-l-V»^fc  +  (3^-io)^V'o“‘(io-xo)} 

fc=0  fc=o 

The  probability  density  function  which  is  used  in  |18]  is 

f{XN,Zf,\UN.i)  =  f{XN\ZN,UN-i)n^L~i^f{z,+i\Ui)nzo) 

From  Bayes  theory,  the  above  two  equations  are  equivalent. 

/(X;v|2;^,i/w_l)n.^lV(^+l|i/i)/(^o)  =  n;^=l[/(^fcN)/(Xfc|x*_,,Ufc_j)]/(zo|xo)/(xo) 

Therefore,  the  exponent  in  [18]  is  equivalent  to  S‘  defined  in  (2.13).  FYom  Lemma  2.1  the  integration 
operation  is  equivalent  to  the  extremization  operation  and  from  Theorem  3.1  the  order  of  the  extremization 
is  irrelevant  in  deriving  the  optimal  control  function.  Therefore,  below  we  show  that  the  value  of  the  control 
given  by  the  controller  in  Theorem  3.5  and  [18]  are  equivalent  and  the  values  of  the  performance  index  are 
equal. 

The  optimal  control  function  is  given  in  Theorem  3.5.  By  substituting  Xj  (3.19)  and  tx*_j,  i  6  {1,  •  •  • ,  t}, 
into  uj,  we  will  obtain 

( 

Uj  =  ti((^(,xo)  =  ^  "4-  Fxo  (5.10) 

«=o 
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where 'Qi  and  F  depend  on  D,,t  €  {0, FVom  Theorem  4.1  u*  is  unique,  i.e.  A  in  (4.8)  is  imique. 
Therefore,  A  and  F  are  unique,  and  in  a  similar  way  the  controller  ul  in  [18]  can  be  reduced  to  (5.10).  Thus, 
substituting  uj  i  €  {0, •••,1}  into  /  Jt+i{Yt+i)dzt+i,  where  Jt+i{Yt+i)  is  in  (2.18),  and  integrating  w.r.t. 
Zi+i  will  produce  the  same  value  of  the  performance  index. 

6.  DERIVATION  OF  THE  OPTIMAL  DECENTRALIZED  CONTROL  LAW 

A  recursion  based  on  (5.7)  is  developed  in  this  section  to  determine  the  decentralized  control  function 
with  one-step  delayed  information-sharing  pattern.  FVom  (4.1)  and  (5.7),  then 

Et  =  ext  min  (x^,n4+ix«+i-fLt+i(yt+i)]  (6.1) 

where  fit+i  =  /]"*,  and  Lt+i(Yt+i)  andxt^-i  are  given  by  the  forward  recursion  equations 

(3.18)  and  (3.19),  respectively.  Substitute  Lt^i{Y%+i)  and  xt+i  into  (6.1),  and  after  considerable  simplifica¬ 
tions  £t  will  be  (See  [9]  for  the  simplification) 

ext  min  (x^K^Vi)-* -f  (9V0-‘(A,/lfn,.^,  -  A.)(9Vr)-*]xt -I- 2x?'(9V0"HAtAj’n,+,/l,At 

-  AOH7'(9e.)-‘^t -1- 2x?'(dVt)-‘At Arfit+iSiUt 22f  (96,)-*  AAtArftt+iB.Ut 
+  zTliee,)-^  +  (90,)-'//,(A,Afn,+i  A,A,  -  A,)//7'(9e,)-']z,  uJ[R,  +  Rrn,+aB,]u,  -)-  LtiY)} 

~  ext  min  [/,  -t-  Lt(yt)]  (6.2) 

*«  tt«60T 

where 


ft  —  xj Qii.tX,  -i-  2xf  Qi2,tZt  +  2xf^Qi3,,ut  +  zf  Q22,tXt  +  2zJ Q23,t^t  +  Qss.t^t 

Qn.t  =  mr^  +  (9Vi)-'[A,Arn,+,A,A.  -  A,)(9Vi)-‘ 

Qi2,t  =  (9Vi)-‘[A,Arn,+iA,A,  -  At]HT{0et)-\  Qw.,  =  (9Vi)-»A,A?’n,+,A 

Q22.t  =  (00t)-*  +  (99,)-' //,[A,A7'n.+,A,A,  -  A,]//?'(9e,)-'  (6.3) 

Q23,t  —  (OQt)  ^Rt^tAjUt+iBt,  Q33,t  Rt  + Bjilt+iBt 

and  n,.|.i  and  A,  are  defined  as, 

n,+,=n,+,[9Vi+,n,+, A,  =  [(9V,)-'  +  Q,-»-A^(9e,)-'A]-'  (6.4) 

Notice  the  value  of  n,+i  is  invariant  for  the  three  information  patterns. 

6.1  The  decentralized  controller 

Notice  that  since  Lt{Yt)  in  (6.2)  is  not  a  function  of  z,  and  u,,  and  it  would  only  act  as  an  added  constant 
to  the  performance  index  in  Lemma  4.2.  Applying  Lemma  4.2  to  (6.2)  yields  the  optimal  decision  gain 
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equations  at  time  stage  t.  In  particular,  the  optimal  decentralized  controller  at  stage  time  t  equals, 


A*  -  -  0 

-f- 

’  Cl  ' 

0  A" 

(6.5) 


and  the  optimal  decision  gains  D\  and  Cl  are  determined  from  the  following  coupled  matrix  equations. 


Di  =  -(iZ*/  +  -  (/9j)’'a;7j]  (6.6) 

q  =  -[/Z‘‘  +  Brn.+,B‘-(/9:)^ai/9i)-‘IBrn.+iA.A,(^VO-‘+  E  (6.7) 

where  /?*  is  the  diagonal  matrix  defined  in  (2.8),  S{  is  given  in  (2.3),  Bj'^d  a\  are  defined  in  section  4 
uang  (6.3),  and  t,  j  e  {1,  •  •  - ,  M). 

Using  the  uniqueness  of  the  optimal  controller  for  the  one-step  delayed  information  pattern,  a  simpler 
form  of  Ct  for  the  one-step  delayed  information-sharing  pattern  is  given  in  the  following  lemma. 

Lemma  6.1:  A  simpler  form  of  the  coefficient  matrix  Ct  for  the  one-step  delayed  information-sharing 
pattern  is 


Ct  =  (kt  -  DtHtXJ  -t-  ^VtUe)-*  (6.8) 

where  A  is  given  in  (6.6).  Furthermore,  zl  is  the  same  for  the  three  information  patterns  :  the  classical, 
one-step  delayed,  and  one-step  delayed  information-sharing  patterns. 

Proof.  The  notation  (.)  is  used  to  distinguish  the  coefficients  Ct,  Dt  and  uj,  for  different  information 
patterns.  Therefore,  (1),  (2),  and  (3)  denote  the  one-step  delayed,  the  one-step  delayed  information-sharing, 
and  an  alternative  form  of  the  one-step  delayed  information  patterns,  respectively.  The  optimal  Uf  and  zt 
for  the  one-step  delayed  information  pattern  [21]  are 

«?(1)  =  Ct{l)xt  -h  Dt(\)zt  =  kt{I  -t-  eVtUtr^xt 


zl{\)  =  Ht{I  +  BVtnt)-^xt  (6.9) 

where  A(l)  is  a  zero  matrix  and 

C,(l)  =  fc.(/  +  0Vint)-‘  (6.10) 

For  the  one-step  delayed  information-sharing  pattern 

ur(2)  =  Ct(2)xt  -h  Dt{2)zt  =  (C7,(2)  -  A(2)A"’«t)it  +  A(2)(*t  +  K^StXt)  (6.11) 
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z:(2)  = 


(6.12) 


where  Ct(2),Dt(2)  are  given  in  (6,5)-(6.7),  and  Rt.St  are  given  in  {4.6)(4.7).  The  equation  (6.11)  will  be 
needed  to  derive  a  simpler  form  of  Ct(2).  FVom  Theorem  5.1,  the  exponent  recursion  £*(>'()  is  the  same 
for  the  three  information  patterns:  the  one-step  delayed,  classical,  and  one-step  delayed  information-sharing 
patterns.  From  (5.7) 

5:t(y,)  =  zfntievtnt  -i-  -i-  L.(y,)  =  /,(«*,  z:,xt)  -i-  Lt(y,) 

where  is  defined  in  (6.2)  and  Lt(Yt)  is  a  constant  term  when  V)  is  given.  Therefore,  fi{‘ul,zl,Xt) 


is  the  same  for  the  three  different  information  patterns. 

A(u:(i),<(i),x,) = /t(u:(2),zr(2),x.) = xintmn^  +  (e.is) 

When  we  consider  /t(u*,z,*),  2^  in  (6.12)  will  eliminate  the  second  term  in  (6.11).  Through  observing 
(6.11)  and  (6.12),  we  can  define  2*(3),Uj(3)  as  in  (6.15)-(6.17)  and  obtain  the  following  relation 

/t(W(2),2r(2),x.)  =  A(u:(3),2r(3).x0  (6.14) 

u:(3)  =  Q(3)x«  -1-  A(3)xt  =  (C,(2)  -  Dt{2)R^^6t)xt  (6.15) 

2,*(3)  *  -R['6ixt  (6.16) 

A(3)  =  (0],  C,(3)  =  Ct{2)  -  Dt{2)Rr^St  (6.17) 

FVom  (6.13)(6.14) 

/,(«:(!).  <(l),x0  =  /t(ur(3),2r(3).x,)  =  xrn,(^Vint  -l-  /J-^x,  (e.lS) 


FVom  (6.18),  u*(3),  z*  (3)  is  also  the  optimal  value  for  the  one-step  delayed  information  pattern.  The  corollary 
4.5  in  [1]  and  Theorem  1.11  in  [20]  give  sufficient  conditions  of  the  unique  optimal  value  for  a  strictly  convex- 
concave  function  and  a  strictly  convex  function,  respectively.  Since  S'  is  strictly  convex  with  respect  to 
when  0  >  0  and  S'  is  strictly  concave  with  respect  to  Xn,Zi  and  strictly  convex  w.r.t. 
when  6  <  0,  then  the  sufficient  conditions  for  the  unique  optimal  value  are  satisfied.  Therefore, 
the  optimal  value  of  Ut,zt  for  the  one-step  delayed  information  pattern  is  unique,  i.e.  uj(3)  =  «J(1)  and 
«?(3)  =  xr(l)-  FVom  (6.9)  and  (6.16) 

-  fq:HtXt  =  //,(/  -I-  0Vtn.)-'i,  (6.19) 
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FYom  (6.10)(6.17)  and  u?(3)  =  uj(l) 


Q(3)  =  C.(l)  =  k,il  +  flV.nj-*  =  C.(2)  - 

From  the  above  two  equations,  the  following  simplified  equation  for  C(  follows  as 

Ct(2)  =  Dt(2)R:^6t  +  Ct(3)  =  A(2)/ir‘«*  +  *«(/  +  =  (*«  -  A(2)W.)(/  +  (6.20) 

FYom  (6.9),  (6.19)  and  (5.6),  2*  is  the  same  for  the  three  different  information  patterns  :  the  one-step  delayed, 
one-step  delayed  information-sharing  and  classical  information  pattems.D 

By  using  (6.8),  u*  with  OSDISP  is  written  as 

=  k^ii +evtnt)-^xt  +  Dtizt  -  Htii  +  evtnt)-'xt)  (6.21) 

.  The  above  form  also  exists  for  the  classical  and  one-step  delayed  information  patterns,  where  Dt  =  0  for 
the  one-step  delayed  information  pattern  and  Dt  =  kt{I +6Vtnt  +  VtHj^&^^Ht)~^VtHj'Qf^  for  the  classical 
information  pattern.  Since  2*  =  Ht(I  +  6Vt\lt)~^xt  will  eliminate  the  second  term  of  (6.21),  only  the  first 
term  influences  the  value  of  /t(u*,  zl,xt).  Therefore,  the  value  of  ft  and  l^t{Yt)  are  the  same  for  the  three 
information  patterns.  □ 

Remark:  Note  that  if  Ct  is  chosen  as  in  (6.8)  with  arbitrary  A.  then  /t(u;,2j,x,)  and  LtiYi)  still  retain 
the  same  values. 

The  necessary  and  sufficient  conditions  for  the  decentralized  controller  with  the  one-step  delayed  information¬ 
sharing  pattern  are  listed  in  following  theorem. 

Theorem  6. 1:  Consider  the  dynamic  LEGT  one-step  delayed  information-sharing  problem  specified  by 
(2.1)-(2.8).  There  is  an  unique  finite  optimal  team  control  law  at  time  t  given  as 

'r\  =  Diz\  +  Cixt  t€{l,  -,M}  (6.22) 

which  yields  finite  cost,  if  and  only  if  the  following  conditions  are  satisfied, 

1.  V~^  +  QQ,  +  >  0  for  all  s  <  t  4- 1  when  0  <  0 

2.  4-  W~^  >  0  for  all  N  >  s  >  t  -Fl  when  6  <0. 

3.  -f-  dllt+i  >  0  when  6  <0 

4a.  Rt  —  Df  Qss^t^t  4-  Qn.tDt  4-  (Qzs.t  A)^  +  Q22,t  <  0  when  0  <  0 
4b.  Rt  >0  and  t?33  j  -  (y3J)^aJ^  >  0  when  5  >  0 

where  the  estimate  of  the  state  based  upon  the  one-step  delayed  information  pattern  Xt  is  given  by  equation 
(3.19),  the  optimal  gain  matrices  D\  and  Cl,  for  t  €  {1,---,M}  are  determined  from  the  coupled  matrix 
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equations  (6.6)  and  (6.7),  or  the  simplified  form  of  C|  in  (6.20).  In  addition,  the  matrix  At  is  determined  by 
(6.4)  . 

Proof.  In  order  to  obtain  the  decentralized  controller  at  time  t,  Pi4.\,Lt+i,  Ft+i  in  Theorem  3.3  and  3.4, 
in  (3.24)  and  (5.9),  in  (5.6),  Dl  in  (6.6)  and  Q  in  (6.7)  need  exist.  Condition  1  guarantees  the 
existence  of  Pt+i,Lt+i  (see  Appendix  A).  Condition  2  is  from  (3.11)  and  guarantees  the  existence  of  ^.+1- 
Condition  3  guarantees  that  in  (3.24)  and  (5.9),  and  in  (5.6)  exist  (see  Section  5.1).  F^om  Lemma 
4.1,  condition  4  and  Qss.t  >  0  will  guarantee  the  existence  of  D]  and  Cl-  Condition  2  implies  >  0.  No¬ 
tice  that  the  assumption  Rt  >  0,  condition  3  and  Ilt+i  >  0  imply  Qsa.t  >  0.  We  prove  necessity  as  following. 
If  any  one  of  conditions  1-4  does  not  exist,  then  from  (2.10)  and  (5.2)  the  expected  value  of  the  cost  will 
become  infinite,  regardless  of  the  choice  of  the  control  function.  This  implies  the  necessity  of  conditions  1-4.  □ 

Observe  that  the  conditions  1-4  in  the  above  Theorem  are  not  contradictory,  and  since  we  are  working 
with  a  discrete  time  system  there  is  only  a  finite  number  of  the  conditions.  Hence,  there  exists  an  e  >  0, 
such  that  the  conditions  1-4  are  satisfied  for  all  |fl|  <  e.  Therefore,  D{  and  CJ  can  be  determined  through 
Theorem  6.1. 


7.  CONCLUSION 

The  results  presented  here  extend  known  optimal  solutions  of  centralized  and  decentralized  control  prob¬ 
lems  with  an  exponential  cost  criterion,  and  in  addition  set  forth  an  innovative  methodology  for  the  solution 
of  optimal  team  problems  with  gaussian  noise  processes  and  exponential  cost  function.  In  Sections  2  and  3 
we  discussed  the  optimal  solution  of  the  centralized  LEXJ  problem  under  the  hypothesis  that  the  controller 
is  constructed  firom  the  past  information  and  the  current  observation  of  state.  This  extends  the  results  of 
Whittle  [21]  who  only  considered  a  control  law  based  upon  the  past  information.  It  is  shown  that  many  of 
the  prevalent  concepts  associated  with  the  LQG  problem,  such  as  the  optimality  of  linear  Markov  controllers, 
Riccati  equations  and  the  separation  principle,  hold  for  this  class  of  problems  in  a  somewhat  modified  form. 
Lemmas  4.1,  4.2  and  Theorem  4.1  give  siifficient  conditions  for  global  optimality  of  the  LEG  static  team 
problem  for  6^0.  For  0  >  0,  the  exponential  of  the  quadratic  function  is  nonconvex  but  unimodal.  This 
extends  the  result  of  Krainak  et.  al.  [11]  who  only  consider  the  convex  exponential  function  {6  <  0).  Using 
the  optimal  control  for  the  classical  information  and  the  one-step  delayed  information  patterns,  we  prove  in 
Theorem  5.1  that  the  value  of  the  argument  of  the  exponential  is  the  same  for  different  information  pat¬ 
terns.  Therefore,  only  |fi|  in  the  coefficient  of  the  exponential  cost  function  changes  for  different  information. 
Through  Theorem  5.1,  an  innovative  method  of  deriving  the  optimal  control  gains  for  the  team  problem  is 
given.  The  backward  iterative  process  in  [10]  is  no  longer  needed.  In  Section  6  we  examined  the  complete 
dynamic  LEGT  problem  with  the  one-step  delayed  information-sharing  pattern  for  0^0,  and  derived  a  set 
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of  coupled  algebraic  equations  satisfied  by  the  optimal  decision  gains  at  each  time  stage.  This  extends  the 
results  of  Krainak  et.  al.  (12]  who  examined  cost  criterion  in  whidi  only  the  terminal  state  penalties  are 
present.  The  results  in  Section  6  is  also  applied  to  the  nonconvex  exponential  function  with  the  quadratic 
function  {G  >  0),  where  in  [12]  only  convex  exponential  function  {fi  <  0)  is  considered. 

Given  the  continuous  results  of  [4]  and  [16]  the  centralized  and  decentralized  synthesis  results  here,  given 
certain  1  ^strictions,  generalized  current  //oo  results  to  time- varying  and  time-invariant  discrete-time  systems 
over  finite  .•’nd  infinite  time  horizons.  Furthermore,  in  the  derivations  of  the  LEG  controller  a  quadratic 
cost  is  constructed  which  is  minimized  with  respect  to  the  control  variables,  but  maximized  with  respect  to 
the  input  uncertainties  when  6  <Q.  This  indicate  a  r»^iationship  between  the  LEG  problem  and  game  theory. 

APPENDICES 


A 


Proof  of  Theorem  3.4. 

Proof  of  (3.17)  and  (3.19)  follows  Whittle’s  paper.  The  derivation  of  LtiXt)  (3.18)  can  be  found  in  [9]. 
However,  a  simpler  derivation  of  Lt{Yt)  will  be  given  in  this  section.  The  forward  recursion  (3.14)  is  as 
follows. 

P,+i(xt+i,  Fi+i)  =  ext[Pt(xt,  Yt)  +  xjQtXt  -»-  uj RtUt  +  0"’(nt(zt+i, it, t)  +  m,(zt, it))]  (A.l) 

If  the  relation  Pt(it,  Vt)  =  (it  -  Xt)^(6Vi)~^{xt  —  it)  +  luiYi)  exists  at  time  t,  then  substitute  Pt(it,  Vt) 
into  (A.l).  We  see  the  quadratic  form  holds  also  for  PtT.i(it+i.  It+i)-  The  value  it+j  is  the  value 
of  the  state  which  extremizes  Pt+i(it+i,yi+i).  We  know  by  assumption  at  t  =  0,  the  quadratic  form 
Po(io,  Vo)  =  (lo  -  io)^(^Vb)"'(io  -  io)  exists. 

Writing  down  the  stationary  condition  of  (A.l)  about  it_j  and  it,  where  it+i  and  ij  are  the  values 
extremizing  Pf+i(it+i,  Vt+i),  results  in 


Hi«+i  -  Ati»  -  H,ut)  =  0  (A.2) 

-  A'[W~\xi+i  -  Atxi  -  Btut)  +  OQiXt  +  V~\xt  -  xt)  -  -  HtXt)  =  0  (A.3) 

Substitute  (A.2)  into  (A.3),  the  first  term  in  (A.3)  will  be  zero.  So  from  (A.3),  we  obtain 

x;  =  (v;->  -I-  oQt  -1-  i/’'er'/f,)-‘v;-‘i.  +  (vi-‘  +  oQt  +  //r©r'Pt)"‘//rer‘^t  (A-4) 
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FYom'(A.2)(A.4) 

*t+i  =  AtXt  +  BiUt 

*t+i  =  At(v;-‘  +  eQi  +  HTQT^HtY^vr^xt  +  a.(v;-'  +  eQt  +  +  Btut 

Rewrite  the  first  term 

AtiYr^  +  eQt  +  HfeT^HtYX-^xt 
=  Ativr^ + eQt + Hj er'««)-'i(vr* + oq. + nTer^Ht)  -  {eQt + 

=  AtXt  -  AtieQt  +  HTez^Ht  +  VrYHeQi  +  HjQ:^Ht)xt 

Therefore,  (3.19)  is  obtained.  The  following  matrix  is  associated  with  the  quadratic  in  xt+i,xt  of  (A.l) 


0  7 

-WC^At 

-Alw-^  »/-’  +  eQt  +  Hfei^Ht  +  AjW-^At 

If  Vj  *  +  eQt  +  H^Q^Bt  +  AfW^^At  >  0  (fl  >  0  or  5  <  0),  by  extremizing  (A.l)  w.r.t.  xt,  we  can  define 
=  Then 

Vt+i  =  {P-  -  7^^' W)~*7^)9“*  (A.5) 

The  derivation  of  Lt+i{Yt+i)  in  (3.18)  will  be  given  as  follows.  From  the  relation  Pt+i{xt+i,Yt+i)  — 
(xt+i  -X{+i)^(^V,+i)-'(zt+i  -X|+i)  +  Lt+i(Kt+i),  we  know 

i't+i(Vt+i)=  ext  Pt+i(®t+i,yi+i) 

*«+« 

=  ext  ext(Pt(x|,  Yt)  +  xfQtXt  +  uj RtUt  +  e~^(nt(x,+, , xt,ut)  +  mt(zt,x,))] 

*4+1  *t 

Extremizing  with  respect  to  Xt+i  will  eliminate  the  term  n((z(+i,xt,ut), 

Lt+i(Yt+i)  =  ext[(xt  -  Xt)‘^((?Vt)"‘(xt  -x»)^+  L,(K()  +  xfQtXt  -h  uf/itUt  +  (zt  -  HtXtY{eQt)~^{zt  -  HtXtY] 

*t 

=  ext{xT[{eVt)-^  +  HTieOtY^Ht  +  Q.jx.  -  2xf[{eVt)-^xt  +  HTieOtY^zt] 

*t 

+  xj  {eVtY^xt  +  lif  RtUt  +  zj  (^t)"*xt  +  f't(l'i)  (A.6) 

If  +  HtQt^Ht  +  eQt  >  0  for  ®  <  0  or  >  0,  then  the  optimal  value  of  it  for  (A.6)  is 

x;  =  (Vt-‘  +  HTQT'Bt + eg,)- ‘(v,-‘xt  +  //re,-***)  (a.?) 

Substitute  Xt  back  into  (A.6),  we  obtain 

^«+i(K+i)  —  x7^(^^)~*xt  +  xY{eQt)~^zt  +  L>t{Yt) 

-  e-\v-^xt + //rer'*.r(vr‘  +  hJqt^ht + eg,)-‘(v,-*xt  +  HjQ:^zt)  (a.8) 
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In  order  to  obtain  V'(+i(A.5)  and  xJ(A.7),  it  is  assumed  that  Vj  *  +  OQi  +  +  AfW^  ^At  >  0 

and  V,“*  +0Qt  +  >  0  when  0  <0  or  6  >0.  The  following  relation  will  be  proved  later  in  Lemma 

3.1  that  +  0Qt  >  0  if  and  only  if  V^~^  +  +  0Qt  +  AjWf^^At  >  0,  and  in  turn 

the  inequality  V,  >  0  exist.  Notice  when  0  >  0,  the  above  two  inequalities  always  exist. 

B 


Proof  of  Lemma  3.1 

FVom  the  matrix  inverse  identity 

+  =  iQ  +  AR-^A'^)-^  (B.l) 


and  (3.17) 

Vi+i  =  Wt  +  At[{Vr^  +  0Qt  +  Aj Wr^At  +  -  Af  Wf 

=  Wt[wr^  -  wr^AtiAjwr^At  -  (vi-» +<?<?* + Ajwr'At  +  HTeT^Ht))-^ATwr^]Wt 
=  Wt[Wt  -  At(V,-^  +  0Qt  +  A^Wr^At  +  HfQT^Htr^Ajr^W, 

=  [wr»  -  w-^Ativ,-^ + OQi  +  Afwr^At  + 

IVf  ‘  -  V-^\  =  lV-‘ At(V;-‘  +  0Qt  +  Aj W,-^At  +  Hje:^Hi)-^AjW^^  (B.2) 

If  (3.21)  exists,  then 

-  Vt+\  >  0  =►  Vi+,  >  >  0  =i>  Vt+i  >  0  (B.3) 

FVom  (3.17) 

>lt(vr‘  +0Qi  +  HTeT^Htr^Al  =  V,+1  -W,>0  (B.4) 

Because  V”,"*  +  0Qi  +  HfQf^Hi  is  assumed  nonj>’’ngular,  therefore  (3.20)  is  obtained. 

vr‘ + 0Qt  +  HTer^Ht  >  o 

If  (3.20)  exists,  reverse  the  above  derivation  (B.4)-(B.2),  and  the  inequality  (3.21)  must  be  true.  Or  by 
adding  AfW^^Ai  >  0  to  (3.20),  the  inequality  (3.21)  is  obtained.  FVom  (3.17)  and  (3.20),  (B.3)  is  obtained 
and  V(  is  positive  definite. 


32 


*  I 

N  • 


First,  a  three-member  team  problem  is  solved.  Then,  it  is  only  an  easy  exercise  to  solve  the  M-member  team 
problem.  To  solve  the  three-member  team  problem,  we  formulate  as  in  (19)  or  chapter  3  of  (9)  a  penon-by- 
person  optimization  problem.  For  the  t-th  member  t  €  {1,2,3},  each  of  the  other  two  control  functions  is  at 
its  one-person  optimum  problem.  Substitute  C^x,  j  ^  i  and  i,j€  {1, 2, 3}  into  the  function  / 

of  (4.5) 

3  3 

minext[x^(?nx-»-2xVij2*-f  V  2x^<?i2Z^ -1- 2xVi3^' +  2x^[  E  +  C^^)] 

3  3  3 

E  2(*‘r<3a*'+  E  (»*)’'!  E  + 

3  3  3 

-h2(2*)^[  Qg(D^z^  +  C^x)j+  ^  (^‘)^(  ^  Q^(D^z^  +  C^x)]  +  (u*)^Q^u* 

3  3  3 

+2(uYl  E  +  E  +  ^ 

j=lj^t  fc=l,fcj4i  }=ij¥» 

The  stationary  conditions  with  respect  to  u’  and  2(1)  yield 

3 

<?^W  +  KQ‘l3r+  E  +  +  =  0  (C.l) 

[(a<)-‘2(t)  -I-  S^x  -t-  7*2<  -I-  /3*u*]  «  0  (C.2) 

By  solving  z(i)  explicitly  from  (C.2)  and  substituting  into  (C.l),  (4.8)  and  (4.9)  are  obtained  where  M  =  3 


.  j  D^QUD^  +  Q^D^  +  {QUD^)-^  +  Q^  D-^^QUl^  +  Q^l^-i-mD^r  +  Q^ 


(a>)-'  = 


-h  {Q^D^V + Qii 


(a^)-  = 


+ cm 


3  ,  ^  -1-  +  Q\l  ^  (q|izj1)T  +  qI2 

D^QUD^  +  QUD^  +  (Q^^D^V  +  QiJ  +  QUD^  +  (Qi|/?")^  +  Qii  J 


0^^  +  D”-QU 


Qii  +  z>^Qg 
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,  ^  ^  ^  f  <y3  =  [  5> 

^1  ^  (<??2  +  QhDY  +  +  QUc^  +  D^iQllc^  +  i?|3<:73) 

(Q?2  +  + 0Uc^  +  D^iQUc^  +  QUC^) 

^2  ^  (Q}2  +  QlzD^V  +  +  Q^C3) 

(Qi2  +  +  <?!iC7‘  +  CMC^  +  r>3^(0ilc»  +  q^c^) 

^3  ^  [  (<3i2  +  +  QM<^‘  +  +  D^'^iQlkc^  +  Qilc^)  ■ 

[  (Q?2  +  Q\zDY  +  QUC^ + QllC^  +  D^iQUc^  +  QUc^) 

Next  we  will  disoiss  second-order  sufficiency.  By  observing  (C.l)  and  (C.2),  if  a’  >  0{6  >  0),  o‘  <  0(0  < 
0)  and  ^3^  >  0,  then  the  control  gain  D*  of  (4.8)  and  C*  of  (4.9)  for  person-by-person  optimization  can  be 
solved.  The  assumption  ft  >  0(0  >  0)  and  <  0(0  <  0)  implies  a'  >  0(0  >  0),  a’  <  0(0  <  0)  ,  where  a*  is 
a  minor  of 

The  optimal  cost  f(u*,z',x)  in  (4.10)  is  to  be  determined.  We  substitute  the  optimal  team  control 
fimction  u*  =  Cx-f  Dz  into  the  function  /  in(4.5),  where  C  and  D  are  given  in  (4.8)  and  (4.9) 

f(u*,  i)  =  ext[i^<3i ix  -1-  2x^Qi22  -f  2x^(?i3U  -I-  z'^Qtzz  +  2z’^Q'au  H-  u^Q33u] 

=  ext[z^{D^Q33D  +  QnD  -H  (QtzD)^  -1-  ^22)2:  +  ^^{C^QzaD  ■+•  Q\zD  -1-  {QxzC)^  +  Qiz)^ 

+  {C^ QzaC  ■¥  QizP {QizC)"^  Qii)x]  (C.7) 

Define  R  and  0  as  in  (4.6)  and  (4.7)  respectively.  Then,  the  stationary  condition  yields  Rz  +  Sx  =  0. 

Note  that  “ext”  means  “min”  for  0  >  0  and  “max”  for  0  <  0.  Assume  R  >  0  when  0  >  0  or  <  0 
when  0  <  0.  Then,  the  equation  z  =  -R~^6x  optimizes  the  function  in  (C.7)  and  (4.10)  is  obtained.  If  the 
assumption  R  >  0(0  >  0)  or  <  0(0  <  0)  is  not  satisfied,  then  the  expected  value  of  the  cost  function  will 
be  infinite  or  negative  infinite.  (See  equations  (5.2)  to  (5.3)) 

D 

A  more  detailed  deri\^tion  about  (5.5), (5.6)  and  (5.7)  is  given  in  this  appendix.  Let  Y  =  [H^{0Qt)~^Ht  -H 
(0Vt)~*  +  and  substitute  (3.24)  into  (5.4) 

ext{-[(0Vi)-*i,  +  HTiee^r^ztfY{{9Vtr^xt  +  i/7’(0e.)-‘x,] 

*e 

-I-  zj (00t)“‘zt  -f  xl (0Vi)~*it} 

=  ext{zr((ee.)-‘  -  {0et)-^HtYHT{0Qt)-^]zt 

-  2xT{0Vtr^YHT{eet)-^zt  -  xr(0H)-‘r(0v,)-»x,  -i-  x7’(0vo->xa  (d.i) 
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By  using  (D.l)  and  the  matrix  inverse  identity  (B.l),  (5.5)  is  obtained.  The  first-order  stationary  condition 
of  zt  in  (5.5)  is 

z;  =  (00,  +  H,{{ev,)-^+nt)-^HT]{ee,r^H,Y{eVtr^xt  (d.2) 

and  can  be  simplified  as 

z;  =  {Ht  +  HtlieVt)-^  -t-  ntr^HT{eQtr^Ht}Yi0Vt)-^xt 

=  Ht(i  +  eVtntr^xt 

The  last  equation  is  (5.6).  Substitute  z*  in  (D.2)  into  (5.5) 

-  x'^{mr'y^T(0Qir^}{0ei + + ntr^HT}wetr^HtY(ev,)-^xt} 

-  xlieVtr^YioVty^xt  -H  xJieVt^-%  (d.3) 

Let 

z  =  HT(0et)-H0Qi  +  HtHeVt)-'  +  nt]-^HTK0Qtr^Ht 
=  HTieOtr^Ht  +  HTiee^r^Htimr^  +  Ui]-^HT{0Qt)-^Ht 
=  Hj'(0et)-^Hi  -  HTi0et)-^Ht{HT{0etr^Ht  -  Y-^}-^HT{0et)-^Ht 

By  using  (B.l) 

z  =  [(i/7’(0e, )-'//<)-' -r]-S 

then  (5.7)  is  obtained  from  (D.3). 

-xJ{0V,)-^{Y[iHT{0Qtr^Ht)-^ -Y]-^Y}m)-^xt 
-  ij {0V,r^Yi0V,)-^xt  -I-  xJ{0Vt)-^xt 
=  -xl{0V,r^[Y-^  -  Hj'{0eO~^Ht]-Hoyt)-%+iJ{0Vtr^xt 

=  -xl{0Vt)-^[i0v,)-^  +  nt]-^mr'iT  +  £TioVir'it 

=  x]'i0Vt  -i-  nr‘r‘xt  =  ifnttevine  -t- 
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